hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2234) Roll Hlog if any datanode in the write pipeline dies
Date Sat, 06 Mar 2010 23:30:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842343#action_12842343

Nicolas Spiegelberg commented on HBASE-2234:

Go ahead and apply the existing patch with the comment changes.

I spent a little bit of time yesterday trying to understand all the layers of buffering between
the SequenceFile.Writer & actually having the pipeline opened and content sent to the
datanodes.  I figured I'd pass that information along since 0.20.2 currently does not support
syncFs().  Without syncFs, the pipeline seems to be created every 64k, which is 'dfs.write.packet.size'.
 The stack trace, with associated buffering, that I was following:

1. SequenceFile.Writer.append() 
2. FSOutputSummer.write()              --> buffers to maxChunkSize. An HDFS chunk is the
amount of data in between checksums. (default: 512bytes)
3. FSOutputSummer.flushBuffer() 
4. FSOutputSummer. writeChecksumChunk() 
5. DFSOutputStream.writeChunk()  --> buffers to currentPacket.maxChunk.  This is the maximum
HDFS chunk count that can be place in a Packet.  Approx byte count is min("dfs.block.size"
(default:64MB), "hbase.regionserver.hlog.blocksize" (default:"dfs.block.size"), "dfs.write.packet.size"
5. DataStreamer.run() <-- creates the pipeline 

> Roll Hlog if any datanode in the write pipeline dies
> ----------------------------------------------------
>                 Key: HBASE-2234
>                 URL: https://issues.apache.org/jira/browse/HBASE-2234
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: Nicolas Spiegelberg
>            Priority: Blocker
>             Fix For: 0.20.4, 0.21.0
>         Attachments: HBASE-2234-20.4-1.patch, HBASE-2234-20.4.patch
> HDFS does not replicate the last block of a file that is being written to. This means
that is datanodes in the write pipeline die, then the data blocks in the transaction log would
be experiencing reduced redundancy. It would be good if the region server can detect datanode-death
in the write pipeline while writing to the transaction log and if this happens, close the
current log an open a new one. This depends on HDFS-826

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message