Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E9A2910DD0 for ; Fri, 24 Jan 2014 08:21:53 +0000 (UTC) Received: (qmail 75153 invoked by uid 500); 24 Jan 2014 08:21:52 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 73620 invoked by uid 500); 24 Jan 2014 08:21:44 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 73203 invoked by uid 99); 24 Jan 2014 08:21:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Jan 2014 08:21:39 +0000 Date: Fri, 24 Jan 2014 08:21:39 +0000 (UTC) From: "Rohith (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (MAPREDUCE-5734) Reducer preemption does not happen if node is blacklisted, intern job get hanged. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Rohith created MAPREDUCE-5734: --------------------------------- Summary: Reducer preemption does not happen if node is blacklisted, intern job get hanged. Key: MAPREDUCE-5734 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5734 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.1.5#6160)