hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5189) Integration with BookKeeper logging system
Date Fri, 10 Apr 2009 21:07:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697979#action_12697979

Konstantin Shvachko commented on HADOOP-5189:

# I've got a compile error:
[javac] hadoop/src/hdfs/org/apache/hadoop/hdfs/server/namenode/BackupStorage.java:29: cannot
find symbol
[javac] symbol  : class EditLogFileInputStream
[javac] location: class org.apache.hadoop.hdfs.server.namenode.FSEditLog
[javac] import org.apache.hadoop.hdfs.server.namenode.FSEditLog.EditLogFileInputStream;
# It is better to create a separate jira for factoring out {{EditLogFileOutputStream}} and
{{EditLogFileInputStream}} from {{FSEditLog}}. This makes sense whether BookKeeper or not,
and it will help not to obscure changes you really do to the code.
# Why do you need a new method {{setStorageDirectories()}} with a Boolean parameter, which
is not used anywhere inside.
# We will need some automation, which will add zookeeper and bookkeeper jars to the project
and synchronize them with new releases.
Can it be done with ivy?
# I agree that edits input part of the code is not generalized for input streams other than
EditLogFileInputStream. This is because there were no alternatives yet. We should work on

The drawback of the approach you implement, besides that it requires separate image and edits
directories, which you mention, is that you do not have a way to retrieve the latest checkpoint
time from the BookKeeper. This is critical for choosing the latest version of the journal,
and you can only get the latest checkpoint time from the local file (StorageDirectory) that
corresponds to the stream. The StorageDirectory may be out of sync with the real state of
the BookKeeper data.

Suppose that you use one file output stream and one BKOutputStream.
Suppose the bookKeeper output stream dies, the name-node keeps writing to the file output
stream for another hour or so, and then gets restarted.
If name-node configured to read from the bookKeeper input stream, then it will get an outdated
state of the namespace, because the current state is in the local file not in the BK.

In general I am very glad that this is moving in the right direction and we will eventually
have a framework which will allow to plug in different logging systems and intermix them if

> Integration with BookKeeper logging system
> ------------------------------------------
>                 Key: HADOOP-5189
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5189
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.19.0
>            Reporter: Luca Telloli
>         Attachments: create.png, HADOOP-5189-trunk-preview.patch, HADOOP-5189-trunk-preview.patch,
HADOOP-5189.patch, HADOOP-5189.patch
> BookKeeper is a system to reliably log streams of records (https://issues.apache.org/jira/browse/ZOOKEEPER-276).
The NameNode is a natural target for such a system for being the metadata repository of the
entire file system for HDFS. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message