Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C5591741B for ; Fri, 17 Jul 2015 06:28:11 +0000 (UTC) Received: (qmail 20861 invoked by uid 500); 17 Jul 2015 06:28:04 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 20792 invoked by uid 500); 17 Jul 2015 06:28:04 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 20775 invoked by uid 99); 17 Jul 2015 06:28:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2015 06:28:04 +0000 Date: Fri, 17 Jul 2015 06:28:04 +0000 (UTC) From: "Ryu Kobayashi (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated MAPREDUCE-6436: ------------------------------------- Assignee: Ryu Kobayashi Status: Patch Available (was: Open) > JobHistory cache issue > ---------------------- > > Key: MAPREDUCE-6436 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Ryu Kobayashi > Assignee: Ryu Kobayashi > Attachments: MAPREDUCE-6436.1.patch, stacktrace1.txt, stacktrace2.txt, stacktrace3.txt > > > Problem: > HistoryFileManager.addIfAbsent produces large amount of logs if number of > cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes > larger than mapreduce.jobhistory.joblist.cache.size by far. > Example: > For example, if the cache contains 50000 entries in total and 10,000 entries > newer than mapreduce.jobhistory.max-age-ms where > mapreduce.jobhistory.joblist.cache.size is 20000, HistoryFileManager.addIfAbsent > method produces 50000 - 20000 = 30000 lines of "Waiting to remove from > JobListCache because it is not in done yet" message. > It will attach a stacktrace. > Impact: > In addition to large disk consumption, this issue blocks JobHistory.getJob > long time and slows job execution down significantly because getJob is called > by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport. > This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded > eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When > multiple threads call scanIfNeeded simultaneously, one of them acquires lock > and the other threads are blocked until the first thread completes long-running > HistoryFileManager.addIfAbsent call. > Solution: > * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take too long time. > * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips > scanning if another thread is already scanning. This changes semantics of > some HistoryFileManager methods (such as getAllFileInfo and getFileInfo) > because scanIfNeeded keep outdated state. > * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are > not blocked by a loop at scale of tens of thousands. > > This patch implemented the first item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)