hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3948) Separate Namenodes edits and fsimage
Date Tue, 26 Aug 2008 01:33:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625584#action_12625584

Konstantin Shvachko commented on HADOOP-3948:

# {{StorageDirType}} does not belong to {{Storage}}. {{Storage}} is a common class for different
storages, so name-node specific 
{{FSIMAGE, FSEDITS}} look extraneous in it.
Let us define an interface {{StorageDirType}} in {{Storage}} and then define {{enum NameNodeDirType
implements StorageDirType}} in FSImage.
I would also rename emun fields: {{UNDEFINED, IMAGE, EDITS, IMAGE_AND_EDITS}}
Storage {
  interface StorageDirType {
    public StorageDirType getStorageDirType();
    public boolean isOfType(StorageDirType type);

FSImage {
  static enum NameNodeDirType implements StorageDirType {
    public StorageDirType getStorageDirType() {
      return this;
    public boolean isOfType(StorageDirType type) {
      if(type == IMAGE_AND_EDITS && (this == IMAGE || this == EDITS)
        return true;
      return this == type;
Then Storage is able to operate entirely in terms of StorageDirType, and FSImage will pass
NameNodeDirType as a value of StorageDirType.
# {{StorageDirectory.root}} should not be public. It is better to introduce {{public File
# {{EditlogFileOutputStream.getFile()}} JavaDoc comment should say
* Returns the file associated with this stream
# In setStorageDirectories() you can loop like this
for (File dirName : fsNameDirs) {
  boolean isAlsoEdits = false;
# Change parameter name
void processIOError(File dirName) { .... }
I am a bit confused that we have so many processIOError() methods both in FSImage and EditsLog.
Can something be done about it?
# {{FSImage.incrementCheckpointTime()}} iterates through storage directories and removes those
that have problems. 
Removing within iterator should break the iterator. Does it?
# In {{FSImage.loadFSImage()}} both latestSD for image directories and latestEditsDir should
be found and checked for consistency (same time) before loading the image.
# And you should be looking for the latest edits dir rather than the one that has image dir's
latestCheckpointTime. In your implementation if I can by mistake specify old image dir, old
edits dir, and new edits dir the cluster will start no problem, and will remove new edits.
We should rather detect this inconsistency and fail to start.
Should have a test case for that too.
# Debug LOG messages in startCheckpoint() should be removed.
# In {{SecondaryNameNode.doMerge()}} you should not scan till the end of the storage directories
but rather pick the first one
sdImage = dirIterator(StorageDirType.FSIMAGE).next();
sdEdits = dirIterator(StorageDirType.FSEDITS).next();
Checking of course that dirIterator()s do not return empty collections.

> Separate Namenodes edits and fsimage 
> -------------------------------------
>                 Key: HADOOP-3948
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3948
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Lohit Vijayarenu
>         Attachments: HADOOP-3948.patch, hadoop-core-trunk.patch, hadoop-core-trunk.patch,
> NameNode's _edits_ and _fsimage_ should be separated with an option of having them in
their own separate directories.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message