hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei Chang (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3107) HDFS truncate
Date Sat, 24 Mar 2012 04:35:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237430#comment-13237430

Lei Chang commented on HDFS-3107:

> The proposed way to go about #3 by creating copies at the DN level and truncating there
seems messy, but if you think about it as a variant of #2 that leaks less information into
the API (block boundaries, contents of last segment), it seems simpler to me.

Agree with you, if only looking at the simplicity of the internal RPC APIs, #3 is simpler.
However, from the implementation part, in #3, clients need to work with both NN and DNs. There
are many cases clients should take care when some nodes fails in the copy/truncate phase and
some nodes succeed. For example:
1) The client should work with NN to handle the failures and do some recovery when DN fails.
It is somewhat like the pipeline rebuild and recovery in the APPEND case.  
2) Client fail introduces some extra work too. (#1 also has to deal with this case, but simpler)

Thus, the implementation of #3 should be easier.

You mentioned a good point about the security part of the temporary file. It should be created
with the same access privilege to the file being truncated.

> HDFS truncate
> -------------
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Lei Chang
>         Attachments: HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
> Systems with transaction support often need to undo changes made to the underlying storage
when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix
operation) which is a reverse operation of append, which makes upper layer applications use
ugly workarounds (such as keeping track of the discarded byte range per file in a separate
metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome
this limitation of HDFS.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message