Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 914631052B for ; Fri, 4 Apr 2014 14:56:21 +0000 (UTC) Received: (qmail 62146 invoked by uid 500); 4 Apr 2014 14:56:20 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 61961 invoked by uid 500); 4 Apr 2014 14:56:18 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 61946 invoked by uid 99); 4 Apr 2014 14:56:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Apr 2014 14:56:17 +0000 Date: Fri, 4 Apr 2014 14:56:17 +0000 (UTC) From: "Jason Lowe (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1901) All tasks restart during RM failover on Hive MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960026#comment-13960026 ] Jason Lowe commented on YARN-1901: ---------------------------------- This appears to be a duplicate of HIVE-6638. As [~ozawa] mentioned, AMs are restarted when the RM restarts until YARN-556 is addressed. When an AM restarts, it is not automatically the case that completed tasks will be recovered -- it must be supported by the output committer. HIVE-6638 is updating Hive's OutputCommitter so it can support task recovery upon AM restart. > All tasks restart during RM failover on Hive > -------------------------------------------- > > Key: YARN-1901 > URL: https://issues.apache.org/jira/browse/YARN-1901 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.4.0 > Reporter: Fengdong Yu > > I built from trunk, and configured RM Ha, then I submitted a hive job. > there are total 11 maps, then I stopped active RM when 6 maps finished. > but Hive shows me all map tasks restat again. This is conflict with the design description. > job progress: > {code} > 2014-03-31 18:44:14,088 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 713.84 sec > 2014-03-31 18:44:15,128 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 722.83 sec > 2014-03-31 18:44:16,160 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 731.95 sec > 2014-03-31 18:44:17,191 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 744.17 sec > 2014-03-31 18:44:18,220 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 756.22 sec > 2014-03-31 18:44:19,250 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 762.4 sec > 2014-03-31 18:44:20,281 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 774.64 sec > 2014-03-31 18:44:21,306 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 786.49 sec > 2014-03-31 18:44:22,334 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 792.59 sec > 2014-03-31 18:44:23,363 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 807.58 sec > 2014-03-31 18:44:24,392 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 815.96 sec > 2014-03-31 18:44:25,416 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 823.83 sec > 2014-03-31 18:44:26,443 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 826.84 sec > 2014-03-31 18:44:27,472 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 832.16 sec > 2014-03-31 18:44:28,501 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 839.73 sec > 2014-03-31 18:44:29,531 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 844.45 sec > 2014-03-31 18:44:30,564 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 760.34 sec > 2014-03-31 18:44:31,728 Stage-1 map = 0%, reduce = 0% > 2014-03-31 18:45:06,918 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 213.81 sec > 2014-03-31 18:45:07,952 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 216.83 sec > 2014-03-31 18:45:08,979 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 229.15 sec > 2014-03-31 18:45:10,007 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 244.42 sec > 2014-03-31 18:45:11,040 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 247.31 sec > 2014-03-31 18:45:12,072 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 259.5 sec > 2014-03-31 18:45:13,105 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 274.72 sec > 2014-03-31 18:45:14,135 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 280.76 sec > 2014-03-31 18:45:15,170 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 292.9 sec > 2014-03-31 18:45:16,202 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 305.16 sec > 2014-03-31 18:45:17,233 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 314.21 sec > 2014-03-31 18:45:18,264 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 323.34 sec > 2014-03-31 18:45:19,294 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 335.6 sec > 2014-03-31 18:45:20,325 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 344.71 sec > 2014-03-31 18:45:21,355 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 353.8 sec > 2014-03-31 18:45:22,385 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 366.06 sec > 2014-03-31 18:45:23,415 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 375.2 sec > 2014-03-31 18:45:24,449 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 384.28 sec > {code} > I am using hive-0.12.0, and ZKRMStateRoot as RM store class. Hive using a simple external table(only one column). -- This message was sent by Atlassian JIRA (v6.2#6252)