hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3113) Provide a configurable way for DFSOututStream.flush() to flush data to real block file on DataNode.
Date Sun, 30 Mar 2008 06:05:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583413#action_12583413

dhruba borthakur commented on HADOOP-3113:

I can make the configurable to be per file, but maybe it makes more sense to make it applicable
to the entire system. The reason being that datanodes do not know much about the name of the
HDFS file that a block belongs to. To make this configurable "per file" would need lots of
protocol change.

The default for block report is 1 hour and the default for lease timeout is 1 hour too. This
needs to be changed so that block reports are sent evenry 30 minutes.

If a client dies while writing to the last block of that file, that block is not yet part
of the blocksmap in the namenode. (A block gets inserted in the blocksmap when a complete
block is received by the datanode and it sends a blockReceived message to the namenode). If
the lease for this file on the namenode expires before the block report from the datanode
arrives, then the namenode will erroneously think that no datanodes have a copy of that block.
As part of lease recovery, the namenode will delete the last block of the file because it
has no entry in the blocksMap. To prevent this from occuring, the block report periodicity
should be set to 30 minutes.

> Provide a configurable way for DFSOututStream.flush() to flush data to real block file
on DataNode.
> ---------------------------------------------------------------------------------------------------
>                 Key: HADOOP-3113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3113
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
> DFSOutputStream has a method called flush() that persists block locations on the namenode
and sends all outstanding data to all datanodes in the pipeline. However, this data goes to
the tmp file on the datanode(s). When the block is closed, the tmp files is renamed to be
the real block file. If the datanode(s) dies before the block is compete, then entire block
is lost. This behaviour wil be fixed in HADOOP-1700.
> However, in the short term, a configuration paramater can be used to allow datanodes
to write to the real block file directly, thereby avoiding writing to the tmp file. This means
that data that is flushed successfully by a client does not get lost even if the datanode(s)
or client dies.
> The Namenode already has code to pick the largest replica (if multiple datanodes have
different sizes of this block). Also, the namenode has code to not trigger replication request
if the file is still being written to.
> The only caveat that I can think of is that the block report periodicity should be much
much smaller that the lease timeout period. A block report adds the being-written-to blocks
to the blocksMap thereby avoiding any cleanup that a lease expiry processing might have otherwise
> Not all requirements specified by HADOOP-1700 are supported by this approach, but it
could still be helpful (in the short term) for a wide range of applications.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message