hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-3113) Provide a configurable way for DFSOututStream.flush() to flush data to real block file on DataNode.
Date Thu, 27 Mar 2008 22:34:25 GMT
Provide a configurable way for DFSOututStream.flush() to flush data to real block file on DataNode.
---------------------------------------------------------------------------------------------------

                 Key: HADOOP-3113
                 URL: https://issues.apache.org/jira/browse/HADOOP-3113
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur


DFSOutputStream has a method called flush() that persists block locations on the namenode
and sends all outstanding data to all datanodes in the pipeline. However, this data goes to
the tmp file on the datanode(s). When the block is closed, the tmp files is renamed to be
the real block file. If the datanode(s) dies before the block is compete, then entire block
is lost. This behaviour wil be fixed in HADOOP-1700.

However, in the short term, a configuration paramater can be used to allow datanodes to write
to the real block file directly, thereby avoiding writing to the tmp file. This means that
data that is flushed successfully by a client does not get lost even if the datanode(s) or
client dies.

The Namenode already has code to pick the largest replica (if multiple datanodes have different
sizes of this block). Also, the namenode has code to not trigger replication request if the
file is still being written to.

The only caveat that I can think of is that the block report periodicity should be much much
smaller that the lease timeout period. A block report adds the being-written-to blocks to
the blocksMap thereby avoiding any cleanup that a lease expiry processing might have otherwise
done.

Not all requirements specified by HADOOP-1700 are supported by this approach, but it could
still be helpful (in the short term) for a wide range of applications.





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message