hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luca Telloli (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-397) Incorporate storage directories into EditLogFileInput/Output streams
Date Wed, 15 Jul 2009 14:30:15 GMT

     [ https://issues.apache.org/jira/browse/HDFS-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Luca Telloli updated HDFS-397:

    Attachment: HDFS-397.patch

Raghu, I acknowledge that there might be a problem with the current code, although unit tests
for BackupNode pass correctly, and I implemented a few changes that should fix this issue.
The problematic lines, as you correctly pointed out, are: 

@@ -295,7 +295,7 @@
-    editLog.divertFileStreams(STORAGE_JSPOOL_DIR + "/" + STORAGE_JSPOOL_FILE);
+    editLog.divertFileStreams();

@@ -363,7 +363,7 @@
-    editLog.revertFileStreams(STORAGE_JSPOOL_DIR + "/" + STORAGE_JSPOOL_FILE);
+    editLog.purgeEditLog();

>From what I understand, the BackupStorage wants to write into a subdirectory of a storage
directory that is not STORAGE_DIR_CURRENT, but STORAGE_SPOOL_DIR. 

Given the original code, and reading current trunk, what will happen in divertFileStreams
is the following: 
- the code iterates over all current storage directories of type EDITS and open streams of
type FILE 
- check if the streams matches the directory 
- then for each stream {
- closes the stream 
- creates a new stream, that in this case would point to STORAGE_SPOOL_DIR/STORAGE_SPOOL_FILE
- replaces the old stream with the new stream in the list of streams

On revertFileStream, what happens is the following: 
- the code iterates over all current storage directories of type EDITS and open streams of
type FILE 
- check if the streams matches the directory 
- then for each stream {
- closes STORAGE_SPOOL_FILE/edits.new 
- renames the parameter source to CURRENT/edits

I'm not sure why the file in the STORAGE_SPOOL_DIR gets renamed to the edits file in the STORAGE_DIR_CURRENT,
but I made a couple of modifications to my code that will implement that same mechanism. 

I understand though that the mechanism of writing into a subdirectory is due to the fact that,
even if BackupNode and Namenode run on different machine, they might be writing on a shared
storage directory, for instance on NFS, which I guess is a concrete example of when the test
could fail, but not directly implementable in the test itself. 

My modifications include adding a JournalType parameter to the divertFileStreams() and purgeEditLog()
methods, as follows: 
-    editLog.divertFileStreams(STORAGE_JSPOOL_DIR + "/" + STORAGE_JSPOOL_FILE);
+    editLog.divertFileStreams(JournalType.BACKUP);
-    editLog.revertFileStreams(STORAGE_JSPOOL_DIR + "/" + STORAGE_JSPOOL_FILE);
+    editLog.purgeEditLog(JournalType.BACKUP);

which is then used in the EditLogFileOutputStream class to take the correct decisions. The
final strings in BackupStorage have been made protected so I can read them from the EditLogFileOutputStream

So I'd say that this is still a reorganization of code, if minor I'll let you to decide, and
there's no change in semantics implied or expected. Unit tests pass fine on Linux. 

It is very unrealistic to expect tests to catch every random bug and typo in a patch (for
anything I have ever worked on or known.
I agree with you, but that's not what I said. 

> Incorporate storage directories into EditLogFileInput/Output streams
> --------------------------------------------------------------------
>                 Key: HDFS-397
>                 URL: https://issues.apache.org/jira/browse/HDFS-397
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Luca Telloli
>            Assignee: Luca Telloli
>         Attachments: HADOOP-6001.patch, HADOOP-6001.patch, HADOOP-6001.patch, HDFS-397.patch,

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message