Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E31918BC9 for ; Tue, 28 Apr 2015 11:36:12 +0000 (UTC) Received: (qmail 19915 invoked by uid 500); 28 Apr 2015 11:36:12 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 19859 invoked by uid 500); 28 Apr 2015 11:36:12 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 19845 invoked by uid 99); 28 Apr 2015 11:36:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Apr 2015 11:36:12 +0000 Date: Tue, 28 Apr 2015 11:36:12 +0000 (UTC) From: "Hudson (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-6252) JobHistoryServer should not fail when encountering a missing directory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-6252?page=3Dcom.atlas= sian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D= 14516878#comment-14516878 ]=20 Hudson commented on MAPREDUCE-6252: ----------------------------------- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #168 (See [https://builds.ap= ache.org/job/Hadoop-Hdfs-trunk-Java8/168/]) Moving MAPREDUCE-6252 to the 2.7.1 CHANGES.txt (devaraj: rev 99fe03e439b0f9= afd01754d998c6eb64f0f70300) * hadoop-mapreduce-project/CHANGES.txt > JobHistoryServer should not fail when encountering a missing directory > ---------------------------------------------------------------------- > > Key: MAPREDUCE-6252 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6252 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver > Affects Versions: 2.6.0 > Reporter: Craig Welch > Assignee: Craig Welch > Fix For: 2.8.0, 2.7.1 > > Attachments: MAPREDUCE-6252.0.patch, MAPREDUCE-6252.1.patch > > > The JobHistoryServer maintains a cache of job serial number parts to dfs = paths which it uses when seeking a job it no longer has in its memory cache= , multiple directories for a given serial number differentiated by time sta= mp. At present the jobhistory server will fail any time it attempts to fin= d a job in a directory which no longer exists based on that cache - even th= ough the job may well exist in a different directory for the serial number.= Typically this is not an issue, but the history cleanup process removes t= he directory from dfs before removing it from the cache which leaves a wind= ow of time where a directory may be missing from dfs which is present in th= e cache, resulting in failure. For some dfs's it appears that the top leve= l directory may become unavailable some time before the full deletion of th= e tree completes which extends what might otherwise be a brief period of fa= ilure to a more extended period. Further, this also places the service at = the mercy of outside processes which might remove those directories. The p= roposal is simply to make the server resistant to this state such that enco= untering this missing directory is not fatal and the process will continue = on to seek it elsewhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332)