hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guo Ruijing (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate
Date Sun, 16 Mar 2014 03:12:43 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936391#comment-13936391

Guo Ruijing commented on HDFS-6087:

issue: The last block is not available for reading.

solution 1: if the block is referenced by client, the block can be moved to remove list in
NN after block is unreferenced by client.

1) GetBlockLocations with Reference option
2) Client copy block to local buffer
3) New RPM message UnreferenceBlocks is sent to NN

solution 2: block is moved to trash and delayed to be deleted in DN.

In exsiting, blocks are deleted in DN after Heartbeat is responded to DN (lazy to delete blocks)

if block is already read by client and the block is requested to delete, DN should delete
the block after read complete.

In most case, client can read the last block:

1) client request block location information

2) HDFS client copy blocks to local buffer. 

3) Heartbeat request to delete block(lazy to delete blocks)

4) HDFS application slowly read data from local buffer.

for race condition 2) and 3), we can delay to delete blocks.

even if block is deleted, client can request new block information.

I like solution 2

> Unify HDFS write/append/truncate
> --------------------------------
>                 Key: HDFS-6087
>                 URL: https://issues.apache.org/jira/browse/HDFS-6087
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>            Reporter: Guo Ruijing
>         Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf
> In existing implementation, HDFS file can be appended and HDFS block can be reopened
for append. This design will introduce complexity including lease recovery. If we design HDFS
block as immutable, it will be very simple for append & truncate. The idea is that HDFS
block is immutable if the block is committed to namenode. If the block is not committed to
namenode, it is HDFS client’s responsibility to re-added with new block ID.

This message was sent by Atlassian JIRA

View raw message