hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jitendra Nath Pandey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1580) Add interface for generic Write Ahead Logging mechanisms
Date Fri, 29 Apr 2011 01:19:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026802#comment-13026802

Jitendra Nath Pandey commented on HDFS-1580:

> In the file-based storage there's no clean way to seek to a particular transaction ID
 A savenamespace will be preceded by a call to mark (like current roll). A file implementation
can close the current file and start a new file at that point. Therefore in usual operation,
when a namenode starts up it will load an fsimage and requests to read transactions after
that point, it will most likely find a file that starts from next transaction id.
 Alternatively, a file implementation can ignore mark and close a file every 100000 transactions.
Now if it has to seek to 50000th transaction it can just read and ignore previous transactions.
Since transaction files will be read only for checkpointing or at namenode startup or by backup
at failover, it is not very expensive. In a recent measurement we found that namenode could
load 1.4M transactions in 27 seconds.

 Also if we store edit logs to book keeper, 2NN can read from book keeper and there won't
be a need for edit transfer, that is another attraction for using book keeper.

> This seems like a somewhat serious flaw. If we anticipate using BK for HA.. 
  Agreed that the backup will lag behind the primary but when failover happens it can quickly
read the additional transactions before declaring itself active. Won't that be an acceptable
delay? There is some discussion on this in ZOOKEEPER-1016.

> Another way of doing this is to say that, if an implementation does have this limitation,
it can choose to "mark" whenever it likes.
  That is correct, however mark will be useful in the interface to be called before a savenamespace.

> Most operations write the edit to the log while holding the FSN lock (to ensure serialized
order between ops) and then drop the FSN lock to sync
  Good catch! A sync method is needed in EditLogOutputStream to be called after releasing
the lock.

> edit log transfer right now is based around the concept of discrete files which can be
entirely fetched, with an associated md5sum
  I think it should be File storage implemenation's responsibility to keep an md5sum with
every file, therefore the safety check while transferring files can still be supported.
  This interface doesn't manage transfer of edit logs. It only talks about reading/writing
the transactions from/to a storage. When 2NN wants to do a checkpoint, it will download the
files from primary, it will then get an EditLogInputStream object using this interface for
the edit log files, and read the transactions.
 For Book-keeper storage, transfer will not be required.

> md5sum /data/{1..4}/dfs/name/current/
  If we use a system like Book-keeper, we won't have the ability to perform this sanity check
anyway. For different file storages, this ability will continue to exist, because a) mark
will be called for all journal instances at the same time, and b) even if file storage implementation
closes file every 100000 transactions it will be consistent for all files.

> Refer to the discussion on HDFS-1073 about this property.
 Sure, I will look at it.

> Add interface for generic Write Ahead Logging mechanisms
> --------------------------------------------------------
>                 Key: HDFS-1580
>                 URL: https://issues.apache.org/jira/browse/HDFS-1580
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>         Attachments: EditlogInterface.1.pdf, HDFS-1580+1521.diff, HDFS-1580.diff, HDFS-1580.diff,
HDFS-1580.diff, generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.txt

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message