hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3107) HDFS truncate
Date Tue, 09 Sep 2014 18:45:32 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127371#comment-14127371

Konstantin Shvachko commented on HDFS-3107:

I see your point. I'll let Plamen speak about current state of the art. Let's talk how it
should be.
# The documentation on snapshots explicetly states "there is no data copying". So may be copy-on-write
is not not desirable here although appealing.
# Another way is not to remove data during truncate if the file is in a snapshot. Just reduce
the length, and deal with block removal / truncation when snapshot is removed. Sort of symmetrical
to append.
# The simplest is to disallow truncate on files that are in a snapshot, as you indicated.
May be we should do this first and add one of the above when a use case emerges?

> HDFS truncate
> -------------
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
> Systems with transaction support often need to undo changes made to the underlying storage
when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix
operation) which is a reverse operation of append, which makes upper layer applications use
ugly workarounds (such as keeping track of the discarded byte range per file in a separate
metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome
this limitation of HDFS.

This message was sent by Atlassian JIRA

View raw message