hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Kelly (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1725) Cleanup FSImage construction
Date Thu, 24 Mar 2011 16:28:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010748#comment-13010748

Ivan Kelly commented on HDFS-1725:

I've address some of your comments in the newest patch. The others I have responded to below.

{quote}1. FSImage.java: Make members storage, conf final{quote}

storage cannot be final due to TestSaveNamespace needing to spy.

{quote}1. The log {{NameNode.LOG.info("set FSImage.restoreFailedStorage");}} is removed. Is
this not necessary?{quote}
NNStorage.restoreFailedStorage already logs this, so the log statement in FSImage is redundant.

{quote}1. The old set of constructors and new are not equivalent. For example: FSImage(Configuration)
constructor in the old code never set editLog. Similarly only one conustros called setCheckpointDirectories(),
now all constructors do. Is this fine?

Where is the functionality of {{FSImage(URI imageDir)}} imeplemented? Corresponding change
in CreateEditsLog.java could have been avoided. I think this is a valid use case and a constructor
and it should nto be removed.

FSImage(Configuration) calls FSImage() which calls FSImage(FSNameSystem) which creates FSEditLog.
It's not very clear, but all constructors eventually call FSImage(FSNameSystem). 

Setting the checkpoint directories in all constructors is harmless. It is only used for import
and SecondaryNameNode (which is there FSImage(Configuration) is used). Setting it doesn't
cause problems and removes unnecessary branching from the code. 

FSImage(Configuration) is also called from FSDirectory(FSNamesystem ns, Configuration conf)
which is what is called in the default case for primary NameNode. 

FSImage(URI) and FSImage(Collection<URI>, Collection<URI>) are only ever used
in test code. I don't think it's good to keep divergent code in production solely for the
purpose of testing, if it can be removed with minimal hassle.

{quote}2. Why are you calling attemptRestoreRemovedStorage() in #reset() method? {quote}

reset() should bring the image back to it's initial state. If a storage has been removed,
then it is not in it's initial state. The actual affect of this call is the same as it was
before. Before, recoverCreateRead called setStorageDirectory, which unlocked which did the
same thing. I moved it into the call to reset (instead of putting it in setStorageDirectory)
because it is only needed on reset. The other call to recoverCreateRead is during initialisation,
so no storage should have been removed by then.

{quote} 4. NameNode.java - why did you remove in #format method, adding editDirsToFormat to
dirsToFormat? The list of directories passed into FSImage in this method are now changed to
the list of directories in conf? {quote}

editDirsToFormat is added to dirsToFormat so that it confirms with the user if they want to
format the directory. Before, you could format an edits dir without confirming with the user,
which is risky.

The list of directories are the same in both cases. getNamespaceDirs(conf) is used in FSImage
if no directories are specified. That said, it would be clearer, and safer, that the names
are explicitly passed to FSImage. Changed. 

{quote}6. SecondaryNameNode.java - in #startCheckpoint() - why did you remove unlockAll()
call? Also recoverCreate() has attemptRestoreRemovedStorage() and unlockAll() call? {quote}

This is similar to the case in BackupNode. setStorageDirectories was only being called to
reset the state of the FSImage. attemptRestoreRemovedStorage is used to try to bring back
any removed directories. unlockAll() is only there to allow analyseStorage to run (it tries
to lock).

{quote}8. TestSaveNamespace.java - why is close() method call to unlock storage not required
any more? {quote}

I assume this is referring to testSaveWhileEditsRolled(). The whole spy() is unnecessary in
this test as nothing is spied on. I've not removed it.

The calls to close() and then setStorageDirectories() in these tests are only there because
mockito doesn't like inner classes, so when you spy FSImage, some things go missing. It's
because StorageDirectory is an innerclass of Storage, so when you initially create the StorageDirectories,
they refer to this original instance of Storage (NNStorage in this case). Then you set a spy
on the storage, creating a new instance and replacing the original. The storageDirectories
still refer to the original, so when you call write and things like that, they'll write invalid
values for checksums and such. It's a really annoying issue, but I'm not sure how to resolve
it, so for now we just call close() and setStorageDirectories() on NNStorage if we want to

> Cleanup FSImage construction
> ----------------------------
>                 Key: HDFS-1725
>                 URL: https://issues.apache.org/jira/browse/HDFS-1725
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 0.23.0
>         Attachments: HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff, HDFS-1725.diff
> FSImage construction is messy. Sometimes the storagedirectories in use are set straight
away, sometimes they are not. This makes it hard for anything under FSImage (i.e. FSEditLog)
to make assumptions about what it can use. Therefore, this patch makes FSImage set the storage
directories in use during construction, and never allows them to change. If you want to change
storagedirectories you create a new image.
> Also, all the construction code should be the same with the only difference being the
parameters passed. When not passed, these should get sensible defaults.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message