hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3107) HDFS truncate
Date Tue, 20 Mar 2012 22:49:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233904#comment-13233904

Todd Lipcon commented on HDFS-3107:

IMO adding truncate() adds a bunch of non-trivial complexity. It's not so much because truncating
a block is that hard -- but rather because it breaks a serious invariant we have elsewhere
that blocks only get longer after they are created. This means that we have to revisit code
all over HDFS -- in particular some of the trickiest bits around block synchronization --
to get this to work. It's not insurmountable, but I would like to know a lot more about the
use case before commenting on the API/semantics.

Maybe you can open a JIRA or upload a design about your transactional HDFS feature, so we
can understand the motivation better? Otherwise I'm more inclined to agree with Eli's suggestion
to remove append entirely (please continue that discussion on-list, though).

After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text,
or even compressed text) datasets were merged using appends.

This is where customers realize their mistake immediately after starting to append, and do
a ctrl-c.
I don't follow... we don't even expose append() via the shell. And if we did, would users
actually be using "fs -append" to manually write new lines of data into their Hadoop systems??

> HDFS truncate
> -------------
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Lei Chang
>         Attachments: HDFS_truncate_semantics_Mar15.pdf
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
> Systems with transaction support often need to undo changes made to the underlying storage
when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix
operation) which is a reverse operation of append, which makes upper layer applications use
ugly workarounds (such as keeping track of the discarded byte range per file in a separate
metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome
this limitation of HDFS.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message