hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate
Date Fri, 14 Mar 2014 21:32:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935686#comment-13935686

Konstantin Shvachko commented on HDFS-6087:

Based on what you write, I see two main problems with your approach.
# A block cannot be read by others while under construction, until it is fully written and
That would be a step back. Making UC-blocks readable was one of the append design requirements
(see  HDFS-265 and preceding work). If a slow client writes to a block 1KB/min others will
have to wait for hours until they can see the progress on the file.
# Your proposal (if I understand it correctly) will potentially lead to a lot of small blocks
if appends, fscyncs (and truncates) are used intensively.
Say, in order to overcome problem (1) I write my application so that it closes the file after
each 1KB written and reopens for append one minute later. You get lots of 1KB blocks. And
small blocks are bad for the NameNode as we know.

> Unify HDFS write/append/truncate
> --------------------------------
>                 Key: HDFS-6087
>                 URL: https://issues.apache.org/jira/browse/HDFS-6087
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Guo Ruijing
>         Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf
> In existing implementation, HDFS file can be appended and HDFS block can be reopened
for append. This design will introduce complexity including lease recovery. If we design HDFS
block as immutable, it will be very simple for append & truncate. The idea is that HDFS
block is immutable if the block is committed to namenode. If the block is not committed to
namenode, it is HDFS client’s responsibility to re-added with new block ID.

This message was sent by Atlassian JIRA

View raw message