hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1700) Append to files in HDFS
Date Wed, 26 Sep 2007 19:12:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530527
] 

dhruba borthakur commented on HADOOP-1700:
------------------------------------------

Block data is transmitted in Step 3 above. The datanode(s) persist the DataGenerationStamp
as soon as it receives it. Then it starts receiving the data. If it crashes before it received
all the data, the Client detects this condition and increments the DataGenerationStamp on
the namenode and on the remaining good datanodes. 

Let's consider the case when the client crashes before the current write (or flush) is successfully
transmitted to all datanodes. In this case it is possible that the Datanodes have different
sizes of this block. In this case, the lease expires on the Namenode and it is the namenode's
duty to do the recovery that the client would have otherwise done. The namenode fetches from
the Datanode the size of each replica of the block-under-modification and selects the largest
size block as valid. It increments the DataGenerationStamp and sends this new stamp to the
datanodes that have the largest size replica.

What happens if the namenode crashes before it could complete the entire lease-timeout-triggered-recovery
(described above) for blocks that were being modified? I am thinking about this one.

> Append to files in HDFS
> -----------------------
>
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
>
> Request for being able to append to files in HDFS has been raised a couple of times on
the list of late.   For one example, see http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
 Other mail describes folks' workarounds because this feature is lacking: e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480
(Later on this thread, Jim Kellerman re-raises the HBase need of this feature).  HADOOP-337
'DFS files should be appendable' makes mention of file append but it was opened early in the
life of HDFS when the focus was more on implementing the basics rather than adding new features.
 Interest fizzled.  Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation
and being able to concurrently read/write -- rather than try and breathe new life into HADOOP-337,
instead, here is a new issue focused on file append.  Ultimately, being able to do as the
google GFS paper describes -- having multiple concurrent clients making 'Atomic Record Append'
to a single file would be sweet but at least for a first cut at this feature, IMO, a single
client appending to a single HDFS file letting the application manage the access would be
sufficent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message