hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
Date Mon, 06 Jun 2011 22:12:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045163#comment-13045163
] 

Matt Foley commented on HDFS-2011:
----------------------------------

Hi Ravi, looking a lot better.  Here's a few more.

TestCheckpoint.testEditLogFileOutputStreamCloses():
1. Use the two-argument form of File ctor:
File(System.getProperty("test.build.data","/tmp"), "editLogStream.dat")
not
File(System.getProperty("test.build.data","/tmp") + "editLogStream.dat")
This will insure the path delimiter is inserted correctly.
2. in the "finally" clause:
Again, there's no point in catch-then-assert, unless you need to do something in between.
 You can just let it fail.  The point of the try/catch I recommended was so that it WOULDN'T
fail, because failing could prevent any prior exception info from propagating.  So the "catch"
clause should use println to log the problem, but not fail or otherwise cause an assert.
3. if you want to fine-tune that a little, you could have a variable "success" which is set
to false at the beginning, and set to true at the end of the main body (before the "finally"
clause).  Then in this "catch" clause you could throw if success==true, but just println if
!success.
4. If you need to use println or Assert to message an exception, you can use StringUtils.stringifyException(e),
which prints the whole stack trace, vs e.toString(), which only prints one line of info. But
"LOG" and "throw" messages allow using Exception objects as additional arguments, giving the
same result as StringUtils.stringifyException() but with cleaner syntax.

testSetCheckpointTimeInStorageHandlesIOException():
5. Use File(System.getProperty("test.build.data","/tmp"), "storageDirToCheck")
instead of File (System.getProperty("test.build.data","/tmp") + "/storageDirToCheck/")
6. and don't put a space between "File" and the following parenthesis.  A ctor is a method
call.
7. You probably want to use mkdirs() rather than mkdir().
8. Extra blanks before and after argument lists are against the coding style standard.
9. In "assertTrue("List of storage directories didn't have storageDirToCheck."...), did you
intend to iterate over all elements in the list?  You go to the trouble of creating an iterator,
and then only use the first element.
10. Doesn't this routine also need a try/catch context that cleans up the created directory
if an error occurs?

EditLogFileOutputStream.close():
11. Interesting problem.  It looks like fc and fp are not directly dependent on bufCurrent
and bufReady, but simply are likely to be null if bufCurrent and bufReady end up null, therefore
I think we should still treat bufCurrent and bufReady as possibly valid/invalid separately.
 Can you tell if the problem value of fc is null or simply a file descriptor that is already
closed?  What if you use the previously suggested statements for bufCurrent and bufReady,
followed by:
{code}
    // remove the last INVALID marker from transaction log.
    if (fc != null && fc.isOpen()) {
      fc.truncate(fc.position());
    }
    if (fp != null) {
      fp.close();
    }
{code}


> Removal and restoration of storage directories on checkpointing failure doesn't work
properly
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-2011
>                 URL: https://issues.apache.org/jira/browse/HDFS-2011
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>         Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.patch,
HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure doesn't work
properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed
storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message