hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-3107) HDFS truncate
Date Tue, 30 Sep 2014 07:10:37 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152875#comment-14152875
] 

Konstantin Shvachko edited comment on HDFS-3107 at 9/30/14 7:09 AM:
--------------------------------------------------------------------

Thanks Dhruba and Colin for your reviews of the design document.
Colin, I'll incorporate your suggestions. But it looks that you got everything right from
the current edition.

??{{boolean truncate(Path src, long newLength)}}. do we really need the boolean here???
* This is an optimization for the case when truncate happens on the block boundary. Clients
will save one RPC call in this particular case.
>From NameNode perspective returning the boolean does not require any extra processing.

??DFSInputStream#locatedBlocks will continue to have the block information it had prior to
truncation.??
* Don't we have [the same behaviour with deletes|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=13237310&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13237310].
Somebody can delete a file on the NameNode, but readers will keep reading old blocks until
they are deleted.
Truncate doesn't add anything new in that regard.

??I don't think we should commit anything to trunk until we figure out how this integrates
with snapshots.??
* You should have seen HDFS-7056 subtask. Mentioning it again to reassure there is no intension
to avoid the snapshot issue.
* People agreed above that [they are OK|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14129351&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14129351]
implementing snapshot integration in a separate jira.
* We also [agreed not to port it to branch 2|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14148406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14148406]
until this is completed.
* And there was a [request to commit|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14150308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14150308]
this sooner rather than later.
* Besides, it seems from your comments that you yourself are in favour of option 3 for snapshots
from the design.

So the question arise what is not clear in the truncate-snapshot story and why you object
committing anything to trunk?


was (Author: shv):
Thanks Dhruba and Colin for your reviews of the design document.
Colin, I'll incorporate your suggestions. But it looks that you got everything right from
the current edition.

??{{boolean truncate(Path src, long newLength)}}. do we really need the boolean here???
* This is an optimization for the case when truncate happens on the block boundary. Clients
will save one RPC call in this particular case.
>From NameNode perspective returning the boolean does not require any extra processing.

??DFSInputStream#locatedBlocks will continue to have the block information it had prior to
truncation.??
* Don't we have the same behaviour with deletes.
Somebody can delete a file on the NameNode, but readers will keep reading old blocks until
they are deleted.
Truncate doesn't add anything new in that regard.

??I don't think we should commit anything to trunk until we figure out how this integrates
with snapshots.??
* You should have seen HDFS-7056 subtask. Mentioning it again to reassure there is no intension
to avoid the snapshot issue.
* People agreed above that [they are OK|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14129351&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14129351]
implementing snapshot integration in a separate jira.
* We also [agreed not to port it to branch 2|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14148406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14148406]
until this is completed.
* And there was a [request to commit|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14150308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14150308]
this sooner rather than later.
* Besides, it seems from your comments that you yourself are in favour of option 3 for snapshots
from the design.

So the question arise what is not clear in the truncate-snapshot story and why you object
committing anything to trunk?

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch,
HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf,
editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the underlying storage
when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix
operation) which is a reverse operation of append, which makes upper layer applications use
ugly workarounds (such as keeping track of the discarded byte range per file in a separate
metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome
this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message