Return-Path: X-Original-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9264A1021D for ; Mon, 31 Mar 2014 16:39:31 +0000 (UTC) Received: (qmail 57325 invoked by uid 500); 31 Mar 2014 16:39:26 -0000 Delivered-To: apmail-hadoop-yarn-dev-archive@hadoop.apache.org Received: (qmail 57247 invoked by uid 500); 31 Mar 2014 16:39:24 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-dev@hadoop.apache.org Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 57227 invoked by uid 99); 31 Mar 2014 16:39:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 16:39:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kasha@cloudera.com designates 209.85.192.53 as permitted sender) Received: from [209.85.192.53] (HELO mail-qg0-f53.google.com) (209.85.192.53) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Mar 2014 16:39:16 +0000 Received: by mail-qg0-f53.google.com with SMTP id e89so981048qgf.40 for ; Mon, 31 Mar 2014 09:38:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=2sQCTonb+on5PY8qk/MvxV6STAYvOB7CUr3BuxBok+c=; b=fgp0YjCS2fYnxQZh+brAQPG+Ouq1cbCtSReh3cDZU95Vg2FBkx1fxz4xBwEkSVFjaH rY+N1qUJulmtdJBg1iTyCY8TCaMxRlSrPMIEtZHczAk+BJz6WK9QFpON8n1MxEpssuLx dIkvnlve6pUmt/WD1h31fLFxoUNS2ZmlsuEJFNT5WV83pq1SfyoPmHdzC5+RWmth34Z1 FQHlxAjfiLBw9dKo9ppesxZYEosU9FnpSlIPbiWOX8V8t9COmDFQ5IGJm//0yfAUJMlD qJv8wrPoMzTUo4U9VDraeBadvSXW0Edu9h+Cg1S9XDGvotPocXxJntAHhs8l3GkfQYbo GNPQ== X-Gm-Message-State: ALoCoQlghH8aNCOERihcQXgyVkOxufauQ62scEaVxPhyDEfZ+XJld3vN0v5PHemWY8RluLaHOkhC MIME-Version: 1.0 X-Received: by 10.140.104.103 with SMTP id z94mr3674192qge.91.1396283934619; Mon, 31 Mar 2014 09:38:54 -0700 (PDT) Received: by 10.96.90.168 with HTTP; Mon, 31 Mar 2014 09:38:54 -0700 (PDT) In-Reply-To: References: Date: Mon, 31 Mar 2014 09:38:54 -0700 Message-ID: Subject: Re: RM ha issuses From: Karthik Kambatla To: "yarn-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a1134f2beedc44504f5e9af5f X-Virus-Checked: Checked by ClamAV on apache.org --001a1134f2beedc44504f5e9af5f Content-Type: text/plain; charset=ISO-8859-1 Thanks for reporting this, Azuryy. Indeed, this is surprising. I don't quite understand how Hive works; do you mind running a vanilla MR job and verifying if this is indeed the case. Also, when you say you stopped the Active RM, you mean only the RM process - correct? On Mon, Mar 31, 2014 at 3:46 AM, Azuryy Yu wrote: > Hi, > > I built from trunk, and configured RM Ha, then I submitted a hive job. > total 11 maps, then I stopped active RM when 6 maps finished. > > but Hive shows me all map tasks restat again. This is conflict with the > design description. > > job progress: > 2014-03-31 18:44:14,088 Stage-1 map = 68%, reduce = 0%, Cumulative CPU > 713.84 sec > 2014-03-31 18:44:15,128 Stage-1 map = 68%, reduce = 0%, Cumulative CPU > 722.83 sec > 2014-03-31 18:44:16,160 Stage-1 map = 68%, reduce = 0%, Cumulative CPU > 731.95 sec > 2014-03-31 18:44:17,191 Stage-1 map = 68%, reduce = 0%, Cumulative CPU > 744.17 sec > 2014-03-31 18:44:18,220 Stage-1 map = 68%, reduce = 0%, Cumulative CPU > 756.22 sec > 2014-03-31 18:44:19,250 Stage-1 map = 68%, reduce = 0%, Cumulative CPU > 762.4 sec > 2014-03-31 18:44:20,281 Stage-1 map = 68%, reduce = 0%, Cumulative CPU > 774.64 sec > 2014-03-31 18:44:21,306 Stage-1 map = 70%, reduce = 0%, Cumulative CPU > 786.49 sec > 2014-03-31 18:44:22,334 Stage-1 map = 70%, reduce = 0%, Cumulative CPU > 792.59 sec > 2014-03-31 18:44:23,363 Stage-1 map = 73%, reduce = 0%, Cumulative CPU > 807.58 sec > 2014-03-31 18:44:24,392 Stage-1 map = 77%, reduce = 0%, Cumulative CPU > 815.96 sec > 2014-03-31 18:44:25,416 Stage-1 map = 80%, reduce = 0%, Cumulative CPU > 823.83 sec > 2014-03-31 18:44:26,443 Stage-1 map = 80%, reduce = 0%, Cumulative CPU > 826.84 sec > 2014-03-31 18:44:27,472 Stage-1 map = 82%, reduce = 0%, Cumulative CPU > 832.16 sec > 2014-03-31 18:44:28,501 Stage-1 map = 84%, reduce = 0%, Cumulative CPU > 839.73 sec > 2014-03-31 18:44:29,531 Stage-1 map = 86%, reduce = 0%, Cumulative CPU > 844.45 sec > 2014-03-31 18:44:30,564 Stage-1 map = 82%, reduce = 0%, Cumulative CPU > 760.34 sec > 2014-03-31 18:44:31,728 Stage-1 map = 0%, reduce = 0% > 2014-03-31 18:45:06,918 Stage-1 map = 2%, reduce = 0%, Cumulative CPU > 213.81 sec > 2014-03-31 18:45:07,952 Stage-1 map = 2%, reduce = 0%, Cumulative CPU > 216.83 sec > 2014-03-31 18:45:08,979 Stage-1 map = 7%, reduce = 0%, Cumulative CPU > 229.15 sec > 2014-03-31 18:45:10,007 Stage-1 map = 11%, reduce = 0%, Cumulative CPU > 244.42 sec > 2014-03-31 18:45:11,040 Stage-1 map = 14%, reduce = 0%, Cumulative CPU > 247.31 sec > 2014-03-31 18:45:12,072 Stage-1 map = 18%, reduce = 0%, Cumulative CPU > 259.5 sec > 2014-03-31 18:45:13,105 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 274.72 sec > 2014-03-31 18:45:14,135 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 280.76 sec > 2014-03-31 18:45:15,170 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 292.9 sec > 2014-03-31 18:45:16,202 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 305.16 sec > 2014-03-31 18:45:17,233 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 314.21 sec > 2014-03-31 18:45:18,264 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 323.34 sec > 2014-03-31 18:45:19,294 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 335.6 sec > 2014-03-31 18:45:20,325 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 344.71 sec > 2014-03-31 18:45:21,355 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 353.8 sec > 2014-03-31 18:45:22,385 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 366.06 sec > 2014-03-31 18:45:23,415 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 375.2 sec > 2014-03-31 18:45:24,449 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 384.28 sec > 2014-03-31 18:45:25,481 Stage-1 map = 23%, reduce = 0%, Cumulative CPU > 396.54 sec > 2014-03-31 18:45:26,512 Stage-1 map = 25%, reduce = 0%, Cumulative CPU > 408.72 sec > 2014-03-31 18:45:27,549 Stage-1 map = 25%, reduce = 0%, Cumulative CPU > 414.69 sec > 2014-03-31 18:45:28,582 Stage-1 map = 30%, reduce = 0%, Cumulative CPU > 426.99 sec > 2014-03-31 18:45:29,614 Stage-1 map = 32%, reduce = 0%, Cumulative CPU > 439.25 sec > 2014-03-31 18:45:30,653 Stage-1 map = 34%, reduce = 0%, Cumulative CPU > 448.25 sec > 2014-03-31 18:45:31,683 Stage-1 map = 39%, reduce = 0%, Cumulative CPU > 460.5 sec > 2014-03-31 18:45:32,723 Stage-1 map = 41%, reduce = 0%, Cumulative CPU > 469.63 sec > 2014-03-31 18:45:33,754 Stage-1 map = 43%, reduce = 0%, Cumulative CPU > 478.67 sec > --001a1134f2beedc44504f5e9af5f--