hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1108) Log newly allocated blocks
Date Sat, 13 Aug 2011 02:13:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084511#comment-13084511

Todd Lipcon commented on HDFS-1108:

bq. We have to log sync after every allocation right, or do you mean only when HA is enabled?
We could consider batching syncs at the cost of response latency

Syncs are already somewhat batched inside of FSEditLog, but it does make the RPC take longer,
holds up the thread, etc. We currently log addBlocks but don't call logSync, assumedly to
save on performance. But, it's at the expense of correctness, which I think is a mistake.
I'll add logSync for now, and if people find a regression we can add a config to disable it.

bq. Piggy backing the initial allocation on the create makes a lot of sense
agree, let's do that separately

bq.  to why DNs are deleting blocks that it currently thinks are part of the file
abandonBlock happens if the pipeline fails to be established at all, so the DNs never even
get the blocks. So, it's only to prevent a followup addBlock from adding on top. But I think
you're right that it has to log it as well... at the worst, there's no harm in it.

> Log newly allocated blocks
> --------------------------
>                 Key: HDFS-1108
>                 URL: https://issues.apache.org/jira/browse/HDFS-1108
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>             Fix For: HA branch (HDFS-1623)
>         Attachments: HDFS-1108.patch, hdfs-1108-habranch.txt
> The current HDFS design says that newly allocated blocks for a file are not persisted
in the NN transaction log when the block is allocated. Instead, a hflush() or a close() on
the file persists the blocks into the transaction log. It would be nice if we can immediately
persist newly allocated blocks (as soon as they are allocated) for specific files.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message