hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1108) ability to create a file whose newly allocated blocks are automatically persisted immediately
Date Fri, 12 Aug 2011 00:09:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083832#comment-13083832

Todd Lipcon commented on HDFS-1108:

A few questions:
- should we syncLog() after every block? I took a look at the rpc metrics of a ~150 node cluster
running HBase, and found that addBlock made up 3% of the operations, and 20% of the write
operations. The number of create() operations and the number of addBlock() operations are
very close to each other, indicating that at least on this cluster, most files consist of
only one block. So, we could consider piggybacking the creation of the first block with the
create() call, and then this wouldn't be an additional fsync to the logs (and would improve
performance too)
- abandonBlock() should maybe call persistBlocks() too?
- should we document this new flag, or consider it an "internal" flag only used to override
for testing? If we determine that the overhead is small, maybe we should just always have
this behavior?

> ability to create a file whose newly allocated blocks are automatically persisted immediately
> ---------------------------------------------------------------------------------------------
>                 Key: HDFS-1108
>                 URL: https://issues.apache.org/jira/browse/HDFS-1108
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-1108.patch, hdfs-1108-habranch.txt
> The current HDFS design says that newly allocated blocks for a file are not persisted
in the NN transaction log when the block is allocated. Instead, a hflush() or a close() on
the file persists the blocks into the transaction log. It would be nice if we can immediately
persist newly allocated blocks (as soon as they are allocated) for specific files.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message