hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1108) Log newly allocated blocks
Date Wed, 24 Aug 2011 21:20:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090499#comment-13090499

Suresh Srinivas commented on HDFS-1108:

bq. Suresh, based on your comments, you should vote -1 on this patch, because this patch calls
persistBlocks only when supportAppends or hasHA. So, why not enable it everytime, unless a
benchmark shows serious regression ?

I do not -1 when the discussion is still in progress :-) I agree we should understand the
cost of persisting irrespective of HA or not. I also think eventually (once 0.23 is stable
enough) supportAppends by default could be set to true and the optimization may not be effective,
in default case.

bq. Suresh, yes you do repeat it. But you never answered MY question, which HA approach are
you implementing. As you see you have to make choices even with issues that seemed to be common
part for all approaches.
I thought it was clear. I would like to implement HA with shared approach and not dependent
on IP failover.

bq. I like Milind's idea about an implementation "without shared storage assumption".
If that is the case, then lets also remove BackupNode from the picture, to remove the argument,
the editlog is persisted in BackupNode's memory.

Removing HA out of the question, isn't not persisting block allocation an issue even with
new append, in the following scenario:
# A block is allocated on an NN
# Client writes starts writing to a block and performs flush.
# NN at this time restarts and has no idea about this new block. During lease recovery, it
closes the file (with no UnderConstruction block).

Will the above scenario result in loss of data?

> Log newly allocated blocks
> --------------------------
>                 Key: HDFS-1108
>                 URL: https://issues.apache.org/jira/browse/HDFS-1108
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>             Fix For: HA branch (HDFS-1623)
>         Attachments: HDFS-1108.patch, hdfs-1108-habranch.txt, hdfs-1108.txt
> The current HDFS design says that newly allocated blocks for a file are not persisted
in the NN transaction log when the block is allocated. Instead, a hflush() or a close() on
the file persists the blocks into the transaction log. It would be nice if we can immediately
persist newly allocated blocks (as soon as they are allocated) for specific files.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message