Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F44D9B36 for ; Tue, 20 Mar 2012 22:50:05 +0000 (UTC) Received: (qmail 33580 invoked by uid 500); 20 Mar 2012 22:50:04 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 33464 invoked by uid 500); 20 Mar 2012 22:50:03 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 33420 invoked by uid 99); 20 Mar 2012 22:50:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2012 22:50:03 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2012 22:50:01 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 430DCCD354 for ; Tue, 20 Mar 2012 22:49:40 +0000 (UTC) Date: Tue, 20 Mar 2012 22:49:40 +0000 (UTC) From: "Todd Lipcon (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <81032606.39157.1332283780276.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2049515590.23016.1331873625982.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-3107) HDFS truncate MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233904#comment-13233904 ] Todd Lipcon commented on HDFS-3107: ----------------------------------- IMO adding truncate() adds a bunch of non-trivial complexity. It's not so much because truncating a block is that hard -- but rather because it breaks a serious invariant we have elsewhere that blocks only get longer after they are created. This means that we have to revisit code all over HDFS -- in particular some of the trickiest bits around block synchronization -- to get this to work. It's not insurmountable, but I would like to know a lot more about the use case before commenting on the API/semantics. Maybe you can open a JIRA or upload a design about your transactional HDFS feature, so we can understand the motivation better? Otherwise I'm more inclined to agree with Eli's suggestion to remove append entirely (please continue that discussion on-list, though). {quote} After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text, or even compressed text) datasets were merged using appends. This is where customers realize their mistake immediately after starting to append, and do a ctrl-c. {quote} I don't follow... we don't even expose append() via the shell. And if we did, would users actually be using "fs -append" to manually write new lines of data into their Hadoop systems?? > HDFS truncate > ------------- > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node > Reporter: Lei Chang > Attachments: HDFS_truncate_semantics_Mar15.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira