Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A561F4CCB for ; Mon, 6 Jun 2011 22:13:20 +0000 (UTC) Received: (qmail 15392 invoked by uid 500); 6 Jun 2011 22:13:20 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 15371 invoked by uid 500); 6 Jun 2011 22:13:20 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 15363 invoked by uid 99); 6 Jun 2011 22:13:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jun 2011 22:13:20 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jun 2011 22:13:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 28EAF104A14 for ; Mon, 6 Jun 2011 22:12:59 +0000 (UTC) Date: Mon, 6 Jun 2011 22:12:59 +0000 (UTC) From: "Matt Foley (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1462661873.2194.1307398379164.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2086595669.53727.1306773107650.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045163#comment-13045163 ] Matt Foley commented on HDFS-2011: ---------------------------------- Hi Ravi, looking a lot better. Here's a few more. TestCheckpoint.testEditLogFileOutputStreamCloses(): 1. Use the two-argument form of File ctor: File(System.getProperty("test.build.data","/tmp"), "editLogStream.dat") not File(System.getProperty("test.build.data","/tmp") + "editLogStream.dat") This will insure the path delimiter is inserted correctly. 2. in the "finally" clause: Again, there's no point in catch-then-assert, unless you need to do something in between. You can just let it fail. The point of the try/catch I recommended was so that it WOULDN'T fail, because failing could prevent any prior exception info from propagating. So the "catch" clause should use println to log the problem, but not fail or otherwise cause an assert. 3. if you want to fine-tune that a little, you could have a variable "success" which is set to false at the beginning, and set to true at the end of the main body (before the "finally" clause). Then in this "catch" clause you could throw if success==true, but just println if !success. 4. If you need to use println or Assert to message an exception, you can use StringUtils.stringifyException(e), which prints the whole stack trace, vs e.toString(), which only prints one line of info. But "LOG" and "throw" messages allow using Exception objects as additional arguments, giving the same result as StringUtils.stringifyException() but with cleaner syntax. testSetCheckpointTimeInStorageHandlesIOException(): 5. Use File(System.getProperty("test.build.data","/tmp"), "storageDirToCheck") instead of File (System.getProperty("test.build.data","/tmp") + "/storageDirToCheck/") 6. and don't put a space between "File" and the following parenthesis. A ctor is a method call. 7. You probably want to use mkdirs() rather than mkdir(). 8. Extra blanks before and after argument lists are against the coding style standard. 9. In "assertTrue("List of storage directories didn't have storageDirToCheck."...), did you intend to iterate over all elements in the list? You go to the trouble of creating an iterator, and then only use the first element. 10. Doesn't this routine also need a try/catch context that cleans up the created directory if an error occurs? EditLogFileOutputStream.close(): 11. Interesting problem. It looks like fc and fp are not directly dependent on bufCurrent and bufReady, but simply are likely to be null if bufCurrent and bufReady end up null, therefore I think we should still treat bufCurrent and bufReady as possibly valid/invalid separately. Can you tell if the problem value of fc is null or simply a file descriptor that is already closed? What if you use the previously suggested statements for bufCurrent and bufReady, followed by: {code} // remove the last INVALID marker from transaction log. if (fc != null && fc.isOpen()) { fc.truncate(fc.position()); } if (fp != null) { fp.close(); } {code} > Removal and restoration of storage directories on checkpointing failure doesn't work properly > --------------------------------------------------------------------------------------------- > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.23.0 > Reporter: Ravi Prakash > Assignee: Ravi Prakash > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira