hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
Date Thu, 17 Sep 2015 07:51:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791739#comment-14791739

Walter Su commented on HDFS-9040:

bq. should we just do updatePipeline when completing the block? 1. In the read-being-written
scenario, there will be a longer window of *false-fresh" (meaning a stale internal block is
considered as fresh).
We should do it before hflush/hsync as well.

bq. 2. When NUM_PARITY_BLOCKS number of streamers are dead, the OutputStream should die immediately
instead of waiting for the next writeChunk.
failed streamer is detected in writeChunk. We plan to add periodical checking. [~jingzhao]
said that before. 

bq. 3. We might want to add the logic to replace a failed StripedDataStreamer in the future.
No, we won't. I think so? if you're talking something like Datanode replacement for repl block.
You can transfer a healthy repl RBW to a new Datanode, then you still get 3 DNs after replacement.
But recover a corrupted RBW internal block is difficult.

I've a question. Instead of delay, Do we even need refresh UC.replicas? 
1. A client read UC block being written can decode replica if it misses some part. ( With
checksum verification, we are only concern about 'missing')
2. Block recovery/ lease recovery truncates all RBW's length to minimal length for repl block.
For striping, Assume a corrupted internalBlock has a small length ,like 200kb. 8 healthy internalBlocks
have long length, like (1mb-cellSize, 1mb+cellSize). Of course after recovery we should truncate
the 8 to 1mb ( 8 healthy internal blocks should be at the same last stripe, but should we
truncate last stripe? That's not my point.). My point is , we can rule out the corrupted internalBlocks
by {{commitBlockSynchronization}}.
3. Maintenance the indices of UC.replicas. UC.replicas updated by BlockReport is safe, because
reportedBlock has ID. If UC.replicas is updated by updatePipeline, the indices are derived
from array offset. You can see {{UC.setExpectedLocations()}} It's error prone. If we don't
refresh UC.replicas we are pretty safe.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
> -------------------------------------------------------------------------------------------
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>         Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040-HDFS-7285.003.patch, HDFS-9040.00.patch,
HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and StripedDataStreamer
s only have to stream blocks to DNs.
> Proposal 2:
> See below the [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
from [~jingzhao].

This message was sent by Atlassian JIRA

View raw message