hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1700) Append to files in HDFS
Date Wed, 05 Sep 2007 20:33:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525213

Doug Cutting commented on HADOOP-1700:

Ah.  I think you're intending that datanodes do persist the block timestamp, but the namenode
does not?  If so, I'd missed that.  I thought you'd said that timestamps were not persisted.
 But if datanodes persist them then that could indeed help detect block id collisions, since
a timestamp collision would nearly impossible (assuming clocks are reasonably synchronized
and accurate).  So would DFSClient send the timestamp with each flushed buffer, and the datanode
log it to a log that's replayed on startup?

As for new opportunities for corruption, I simply meant that having multiple versions of a
block increases the chances of getting the wrong version.  The namenode and datanode will
have substantial new logic to handle block versioning, and more logic increases the chances
of faulty logic, introduces new failure modes, etc.  The proposal I made required far fewer
fundamental changes to block semantics, and thus mostly builds on already debugged logic.
 Changing blocks from immutable to mutable will require us to uncover all the places where
we've assumed immutability.  Some of these may not be obvious.  That's all I meant.  Just
merrily spreading a little FUD about change of any sort!

> Append to files in HDFS
> -----------------------
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
> Request for being able to append to files in HDFS has been raised a couple of times on
the list of late.   For one example, see http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
 Other mail describes folks' workarounds because this feature is lacking: e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480
(Later on this thread, Jim Kellerman re-raises the HBase need of this feature).  HADOOP-337
'DFS files should be appendable' makes mention of file append but it was opened early in the
life of HDFS when the focus was more on implementing the basics rather than adding new features.
 Interest fizzled.  Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation
and being able to concurrently read/write -- rather than try and breathe new life into HADOOP-337,
instead, here is a new issue focused on file append.  Ultimately, being able to do as the
google GFS paper describes -- having multiple concurrent clients making 'Atomic Record Append'
to a single file would be sweet but at least for a first cut at this feature, IMO, a single
client appending to a single HDFS file letting the application manage the access would be

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message