Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 58481 invoked from network); 19 May 2010 12:20:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 May 2010 12:20:21 -0000 Received: (qmail 17714 invoked by uid 500); 19 May 2010 12:20:21 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 17673 invoked by uid 500); 19 May 2010 12:20:21 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 17665 invoked by uid 99); 19 May 2010 12:20:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 May 2010 12:20:21 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 May 2010 12:20:18 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4JCJuue012748 for ; Wed, 19 May 2010 12:19:57 GMT Message-ID: <22867413.12341274271596848.JavaMail.jira@thor> Date: Wed, 19 May 2010 08:19:56 -0400 (EDT) From: "Vinay Kumar Thota (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-1794) Test the job status of lost task trackers before and after the timeout. In-Reply-To: <2447471.80641274095242933.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869112#action_12869112 ] Vinay Kumar Thota commented on MAPREDUCE-1794: ---------------------------------------------- {quote} The JobStatus should be JobStatus.FAILED instead of succeeded. If the task tracker was lost for all the four attempts of a task should'nt the job fail instead of succeed, if that is not the case the message in the assert has to be changed the job suceeded even when loosing task tracker for 4 times. {quote} [Vinay]: I think you misunderstood the functionality. If tasktracker was lost and it wait for timeout, later that task was marked as a killed and resubmitting into another task tracker. Even if it kills for four attempts due to lost tasktracker, it will resubmitting to another tasktracker for 5th time and keep continues until task succeed. For Killed tasks mapred.map.max.attempts attribute won't applicable,so it attempts the task 'N' no.of times. Max attempts is only applicable for failed tasks. In this case the job status should be succeed because of task might succeed at one point of time. {quote} Why do we care for checking the job status for 40 % completion, also can be enhance the building blocks to check this kind of status, since the code can be reused elsewhere. {quote} [Vinay] : We just wanted to make sure, the job should start and completes atleast 40% because, atleast one map or reduce tasks should run on the tasktracker for checking the conditions. {quote} The above code is repeated coupld of times can be part of a function, if this is used accross test cases then can be part of building block. {quote} [Vinay] : I will refactor the code by making the function.I don't thinks so it useful across the testcases. {quote} If you see the story description we said we will suspend the task tracker and resume it, but it seems that you have followed the route of killing the task tracker instead of pausing and resuming it. I think kiling should be fine since kill/start it emaulates the pause and resume, but on the performance side if we had used pause and resume, so the waits in the test cases can be reduced. {quote} [Vinay] : I am pausing by stoping the tasktracker and resuming it by starting the tasktracker.So I don't think there would be a performance issue. {quote} One general question I have is after killing the same task tracker 4 times, the task tracker should get blacklisted, and if you resubmit the job again, the task tracker should not be used by job tracker. Is it good to check that condition as part of this test case or do you think this is out of scope. There is url which has the blacklisted tasktracker, if we can get the number through aspect then it can be verified. Also at the end of the test we need to remove the task tracker from blacklisted condition for the other tests to run without any problem. {quote} [Vinay] : for killed tasks, max attempts won't applicable like I said above. So there won't be any blacklisted. > Test the job status of lost task trackers before and after the timeout. > ----------------------------------------------------------------------- > > Key: MAPREDUCE-1794 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1794 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test > Reporter: Vinay Kumar Thota > Assignee: Vinay Kumar Thota > Attachments: 1794_lost_tasktracker.patch > > > This test covers the following scenarios. > 1. Verify the job status whether it is succeeded or not when the task tracker is lost and alive before the timeout. > 2. Verify the job status and killed attempts of a task whether it is succeeded or not and killed attempts are matched or not when the task trackers are lost and it timeout for all the four attempts of a task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.