hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3107) HDFS truncate
Date Wed, 17 Sep 2014 20:10:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137882#comment-14137882

Colin Patrick McCabe commented on HDFS-3107:

bq. The main use case as far as I understand from this and other conversations is transaction
handling for external databases. DB writes its transactions into a HDFS file. While transactions
succeeds the DB keeps writing the same file. But when tx fails it is aborted and the file
is truncated to the previous successfull transaction.

As I mentioned earlier, the external database could simply use length-prefixed records.  Then,
if it encounters a partial record, it is ignored.  Flume has been doing something like this
for a while.  So I don't see this as a very important use case.

bq. There is no divergent history. If you truncate you loose data that you truncated. You
will not be able to open file for append until truncate is catually comleted and DNs shrink
the last block replicas. Then file can be opened for append and add new data.

OK, that's fair.  The single-writer nature of HDFS makes this easier.

There are also interactions with files open for read.  The NameNode doesn't know which files
are open for read so you cannot forbid this.  Keep in mind that there is a generous amount
of buffering inside DFSInputStream.  So following a truncate + append new data, we may continue
to read the "old" truncated data from the buffer inside the DFSInputStream's {{RemoteBlockReader2}}
for a while.  That is partly what I meant by a "divergent history." This is probably ok, but
the semantics need to be spelled out in the design doc.

bq. How this interacts with snapshots... is something yet to be designed


bq. There is a patch attached. Did you have a chance to review? It is much simpler than append,
but it does not allow to truncate files in snapshots. If we decide to implement copy-on-write
approach for truncated files in snapshots, then we may end up creating a branch.

I'm -1 on committing anything without a design doc.  I apologize if this seems harsh, but
I don't want there to be any ambiguity.  I think you are on the right track, but let's see
a complete design and then get started with committing the code.  Thanks, Konstantin.

> HDFS truncate
> -------------
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
> Systems with transaction support often need to undo changes made to the underlying storage
when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix
operation) which is a reverse operation of append, which makes upper layer applications use
ugly workarounds (such as keeping track of the discarded byte range per file in a separate
metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome
this limitation of HDFS.

This message was sent by Atlassian JIRA

View raw message