hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3107) HDFS truncate
Date Wed, 17 Sep 2014 20:25:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137911#comment-14137911

Colin Patrick McCabe commented on HDFS-3107:

bq. Using a record-length prefix is not a good fix to get around this. What happens if you
fail when writing your record length?

In that case, the record is incomplete and not valid.  It's pretty clear when bytes are missing
from a fixed-length 4 byte record.

bq. I would argue that this has everything to do with append. You are absolutely correct that
HDFS can write a bad file on a standard open/write. The 'undo' for this failure is the delete
operation. Your data integrity is preserved regardless of any external factors (file format,
metadata, applications, etc). You can't have bad data if you never write bad data.

I don't follow.  What does append have to do with writing partial records?  You can write
partial records without append, and append doesn't make it any more or less likely.

As I said earlier, "append" really should have been called "reopen for write".  You don't
need to use "append" to create and append to a file (confusing, I know)

> HDFS truncate
> -------------
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
> Systems with transaction support often need to undo changes made to the underlying storage
when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix
operation) which is a reverse operation of append, which makes upper layer applications use
ugly workarounds (such as keeping track of the discarded byte range per file in a separate
metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome
this limitation of HDFS.

This message was sent by Atlassian JIRA

View raw message