Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D75E9200BC7 for ; Fri, 21 Oct 2016 08:20:00 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D60F5160AE0; Fri, 21 Oct 2016 06:20:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 37007160AF2 for ; Fri, 21 Oct 2016 08:20:00 +0200 (CEST) Received: (qmail 65716 invoked by uid 500); 21 Oct 2016 06:19:59 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 65675 invoked by uid 99); 21 Oct 2016 06:19:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2016 06:19:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9B53A2C2AB6 for ; Fri, 21 Oct 2016 06:19:58 +0000 (UTC) Date: Fri, 21 Oct 2016 06:19:58 +0000 (UTC) From: "Prabhu Joseph (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 21 Oct 2016 06:20:01 -0000 [ https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-6797: ------------------------------------- Fix Version/s: 2.9.0 Status: Patch Available (was: Open) > Job history server scans can become blocked on a single, slow entry > ------------------------------------------------------------------- > > Key: MAPREDUCE-6797 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver > Affects Versions: 2.4.0, 2.8.0 > Reporter: Prabhu Joseph > Assignee: Prabhu Joseph > Priority: Critical > Fix For: 2.9.0 > > Attachments: 0001-MAPREDUCE-6797.patch, jstack > > > There is one more piece of code in HistoryFileManager where Synchronized keyword on HistoryFileInfo need to be removed. The JobHistoryServer contention issue is hit on our environment where stacktrace (attached) shows the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock on HistoryFileInfo. > Synchronized on isMovePending and didMoveFail has been removed by Mapreduce-6684. > {code} > HistoryFileInfo firstValue = cache.get(key); > synchronized(firstValue) { ---------------> Synchronized is not needed here > if (firstValue.isMovePending()) { > if(firstValue.didMoveFail() && > firstValue.jobIndexInfo.getFinishTime() <= cutoff) { > cache.remove(key); > //Now lets try to delete it > try { > firstValue.delete(); > } catch (IOException e) { > LOG.error("Error while trying to delete history files" + > " that could not be moved to done.", e); > } > } else { > LOG.warn("Waiting to remove " + key > + " from JobListCache because it is not in done yet."); > } > } else { > cache.remove(key); > } > } > {code} > {code} > Note: stacktrace is from hadoop-2.4.0 version and the problem exists in latest hadoop as well > "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x0000000001e13800 nid=0xf133 waiting for monitor entry [0x00007f7c1d8dd000] > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226) > - waiting to lock <0x000000040145c4d8> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo) > at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825) > at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82) > at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280) > - locked <0x0000000400375388> (a org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir) > at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792) > at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920) > at org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156) > at org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org