hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1700) Append to files in HDFS
Date Thu, 30 Aug 2007 19:26:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523919

Doug Cutting commented on HADOOP-1700:

> A block could have a generation number that gets incremented every time the block is
accessed for modification. 

But isn't <blockid+generation> really tantamount to a new block id?  How is this any
different from simply hard-linking to the old block file and then modifying it?

> Making a block immutable makes appends and truncates very heavyweight. You have to copy
on average blocksize*2*num_replicas bytes to make any sort of modification.

The copies could be local on each datanode, not across the wire.  They could be made to be
from one drive to another, or in the case of append, you might hard link to the old block
file if the client is sure to never read past block end.  So append need not be heavy weight
at all.

If you're willing to possibly break existing readers while truncating, then you could always
hard link block files, and never perform any copying.  The file could be corrupted, e.g.,
if a truncate succeeds on the datanode but the block list on the namenode is not updated to
reflect that, or vice versa, but I think the blockid+revision approach has exactly the same
issue.  Copying could prevent such issues, making modifications atomic, but at some cost to

> Append to files in HDFS
> -----------------------
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
> Request for being able to append to files in HDFS has been raised a couple of times on
the list of late.   For one example, see http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
 Other mail describes folks' workarounds because this feature is lacking: e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480
(Later on this thread, Jim Kellerman re-raises the HBase need of this feature).  HADOOP-337
'DFS files should be appendable' makes mention of file append but it was opened early in the
life of HDFS when the focus was more on implementing the basics rather than adding new features.
 Interest fizzled.  Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation
and being able to concurrently read/write -- rather than try and breathe new life into HADOOP-337,
instead, here is a new issue focused on file append.  Ultimately, being able to do as the
google GFS paper describes -- having multiple concurrent clients making 'Atomic Record Append'
to a single file would be sweet but at least for a first cut at this feature, IMO, a single
client appending to a single HDFS file letting the application manage the access would be

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message