hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roman Shaposhnik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3107) HDFS truncate
Date Wed, 01 Oct 2014 00:48:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154121#comment-14154121
] 

Roman Shaposhnik commented on HDFS-3107:
----------------------------------------

FWIW I would like to provide a few additional datapoints to what [~shv] has said:
   # in its current form, this is an extremely useful self-contained feature that allows various
vendors of solutions running on Hadoop to build products having having much easier time running
on HDFS.
  # it is true that currently there's not immediate integration with snapshot functionality,
but the way the current patch is implemented makes it extremely easy to expand the scope of
the feature to snapshots. In other words, if this current implementation gets committed it
will NOT create a migration opportunity. The snapshot+truncate can be added into later releases
of HDFS and applications targeting truncate as it is implemented currently will continue to
run unmodified.
  # as HDFS-7056 indicated it will take more time to come up with design and implementation
of the complimentary functionality that would extend truncate to snapshotted files. It feels
unfortunate if we had to hold the current patch hostage, even though today it delivers a very
much needed functionality AND it allows for smooth migration for when snapshot+truncate gets
implemented.
  # we all know that features sitting in a branch don't get exposed to commercial distributions
and workloads as much as the ones hitting trunk do. This is, of course, a totally right approach
to features that are half-baked or not self-contained, but it feels that in this particular
case committing the patch would benefit us all by giving customers access to the self-contained
feature AND start receiving feedback for the more extended functionality much earlier.
 
Hope this provides additional food for thought to reconsider this patch for inclusion. Also,
FWIW, based on our testing, this feels like an extremely useful and important feature to get
into Hadoop now and extend to cover snapshots later.

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch,
HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf,
HDFS_truncate_semantics_Mar21.pdf, editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the underlying storage
when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix
operation) which is a reverse operation of append, which makes upper layer applications use
ugly workarounds (such as keeping track of the discarded byte range per file in a separate
metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome
this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message