Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Tue, 20 Mar 2012 22:49:40 +0000 (UTC)
From: "Todd Lipcon (Commented) (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: 
 <81032606.39157.1332283780276.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <2049515590.23016.1331873625982.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (HDFS-3107) HDFS truncate
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233904#comment-13233904 ] 

Todd Lipcon commented on HDFS-3107:
-----------------------------------

IMO adding truncate() adds a bunch of non-trivial complexity. It's not so much because truncating a block is that hard -- but rather because it breaks a serious invariant we have elsewhere that blocks only get longer after they are created. This means that we have to revisit code all over HDFS -- in particular some of the trickiest bits around block synchronization -- to get this to work. It's not insurmountable, but I would like to know a lot more about the use case before commenting on the API/semantics.

Maybe you can open a JIRA or upload a design about your transactional HDFS feature, so we can understand the motivation better? Otherwise I'm more inclined to agree with Eli's suggestion to remove append entirely (please continue that discussion on-list, though).

{quote}
After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text, or even compressed text) datasets were merged using appends.

This is where customers realize their mistake immediately after starting to append, and do a ctrl-c.
{quote}
I don't follow... we don't even expose append() via the shell. And if we did, would users actually be using "fs -append" to manually write new lines of data into their Hadoop systems??

                
> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Lei Chang
>         Attachments: HDFS_truncate_semantics_Mar15.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira