Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB855C434 for ; Tue, 4 Jun 2013 03:23:29 +0000 (UTC) Received: (qmail 4548 invoked by uid 500); 4 Jun 2013 03:23:27 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 4488 invoked by uid 500); 4 Jun 2013 03:23:25 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 4470 invoked by uid 99); 4 Jun 2013 03:23:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jun 2013 03:23:21 +0000 Date: Tue, 4 Jun 2013 03:23:21 +0000 (UTC) From: "Maysam Yabandeh (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5267) History server should be more robust when cleaning old jobs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673993#comment-13673993 ] Maysam Yabandeh commented on MAPREDUCE-5267: -------------------------------------------- I believe the particular bug reported in this JIRA is rooted in the implementation of listFiles. I attempted to reproduce the reported scenario by creating a directory DIR under /mapred/history/done/ with only root access. In my local machine, the current unit tests smoothly pass over the DIR by returning an empty list upon invocation of listFiles(). I guess this is not the case for hdfs, and similarly to what this jira reports, an exception will be raise (although i have not managed to run a unit test that exercise this). Nevertheless, I agree with you that this problem should be addressed at a higher level, since we do not know what is the next unpredictable scenario that raises an exception in the clean procedure. I would like to pick up this jira but I do not know how to write a unit test that exercise a method by raising (general) exceptions in the middle of it. > History server should be more robust when cleaning old jobs > ----------------------------------------------------------- > > Key: MAPREDUCE-5267 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5267 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver > Affects Versions: 0.23.7, 2.0.4-alpha > Reporter: Jason Lowe > > Ran across a situation where an admin user had accidentally created a directory in one of the date directories under /mapred/history/done/ that was not readable by the historyserver user. That effectively prevented the history server from cleaning any jobs from that date forward, as it hit an IOException trying to scan the directory and that aborted the entire clean process. > The history server should localize IOException handling to the directory/file being processed and move on to the next entry in the list rather than aborting the entire cleaning process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira