hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3107) HDFS truncate
Date Wed, 01 Oct 2014 23:51:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155816#comment-14155816
] 

Colin Patrick McCabe commented on HDFS-3107:
--------------------------------------------

So to summarize all this above, myself, [~jingzhao], and [~sureshms] have made the point that
this feature is still incomplete until it supports snapshots.  Since snapshot support may
involve fundamentally changing the design (e.g. copying the final partial block file versus
using block recovery), we need to figure it out before merging to trunk or branch-2.

As far as I can see, there are two options here.  We could roll the snapshots support into
this patch, or start a feature branch with the above commit... and then do snapshots support
(and whatever else is needed) in that feature branch.  I'm fine with either option, I have
absolutely no preference.  I'm happy to review anything and Jing has offered to help with
snapshots support.

[~rvs]: I realize that getting truncate into a release is important to you.  If people get
this done by 2.6, I wouldn't oppose putting it in.  But you will have to convince the release
manager for 2.6, and propose it to the community.  Since 2.6 is already a very big release,
I think you will get pushback.

I also think you should evaluate writing length-delimited records instead of using truncate.
 Truncate is an operation that can fail, and relying on truncate to clean up mistakes will
always be more fragile than writing in a format that can ignore torn records automatically.
 Think about it... if a write error occurs because of a disk I/O error, isn't it likely that
truncate will also get an error when trying to shrink that block file?  And for write errors
that result from clients dropping off the network, truncate will also be impossible because
of the lack of connectivity.

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch,
HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf,
HDFS_truncate_semantics_Mar21.pdf, editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the underlying storage
when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix
operation) which is a reverse operation of append, which makes upper layer applications use
ugly workarounds (such as keeping track of the discarded byte range per file in a separate
metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome
this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message