hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1700) Append to files in HDFS
Date Fri, 31 Aug 2007 03:28:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523987

Sameer Paranjpye commented on HADOOP-1700:

> But isn't <blockid+generation> really tantamount to a new block id?

Not really, the implications for implementation are pretty different. If a new block id is
to be used, the Namenode has to allocate a new block and delete the old block. Scheduling
the old blocks replicas for deletion, dispatching the requests and journaling the new block
is a non-trivial amount of Namenode activity. A revision number update can simply be recorded
in memory. In the event of a conflict the Namenode would treat the highest revision numbered
replicas as valid and discard out of date replicas.

> Copying could prevent such issues [ ... ]

Copying does make error handling somewhat easier. But it seems to me that it does so only
when changes to a file are exposed in the Namenode at a block granularity. If we want to make
changes visible at a finer grain both approaches have similar complexity in the corner cases
of datanodes and writers crashing in the middle of updates.

> Append to files in HDFS
> -----------------------
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
> Request for being able to append to files in HDFS has been raised a couple of times on
the list of late.   For one example, see http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
 Other mail describes folks' workarounds because this feature is lacking: e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480
(Later on this thread, Jim Kellerman re-raises the HBase need of this feature).  HADOOP-337
'DFS files should be appendable' makes mention of file append but it was opened early in the
life of HDFS when the focus was more on implementing the basics rather than adding new features.
 Interest fizzled.  Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation
and being able to concurrently read/write -- rather than try and breathe new life into HADOOP-337,
instead, here is a new issue focused on file append.  Ultimately, being able to do as the
google GFS paper describes -- having multiple concurrent clients making 'Atomic Record Append'
to a single file would be sweet but at least for a first cut at this feature, IMO, a single
client appending to a single HDFS file letting the application manage the access would be

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message