hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
Date Wed, 23 Sep 2015 03:28:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903889#comment-14903889
] 

Walter Su commented on HDFS-9040:
---------------------------------

bq. 1. Flush out all the enqueued data to DataNodes before handling failures and bumping GS.
Great. It's much simpler. In checkStreamerFailures(boolean toClose), you will flushAllInternals
anyway before start handling. It doesn't hurt to flush twice. So {{toClose}} is unnecessary?
bq. 3. During the test I found that some data streamer may take a long time to close/create
datanode connections. This may cause other streamers' connections timeout. Thus the new patch
adds an upper bound for the total waiting time of creating datanode connections during failure
handling.
bq. +   && remaingTime > waitInterval * 2) {
It's not good enough approach. {{socketTimeout}} is 6s by default. Here you wait at most 4s.
I remember you just flushAllInternals() before. When dataQueue.size()==0, a healthy streamer
could in sleep for at most {{halfSocketTimeout}}, aka 3s. So you give this streamer 1s left
to create blockStream and offer updateStreamerMap. If it doesn't finish in 1s, you kill it.
I think we should notify every dataQueues to wake up streamers after markExternalErrorOnStreamers(),
so every streamer has 4s. And it would be better if streamers start sending heartbeat packet
in the middle of waiting other streamers, but it's too hard.
bq. 2.Instead of let each DataStreamer write their own last empty packet of the block, we
do it in the StripedOutputStream level so that we can still bump GS for failure handling before
some streamers close their internal blocks.
{code}
        if (shouldEndBlockGroup()) {
          for (int i = 0; i < numAllBlocks; i++) {
            final StripedDataStreamer s = setCurrentStreamer(i);
            if (s.isHealthy()) {
              endBlock();
            }
          }
        }
{code}
The logic looks good. Before we have a solution for PIPELINE_CLOSE_RECOVERY, should we catch
the exception thrown by endBlock() and ignore it?

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>            Assignee: Jing Zhao
>         Attachments: HDFS-9040-HDFS-7285.002.patch, HDFS-9040-HDFS-7285.003.patch, HDFS-9040-HDFS-7285.004.patch,
HDFS-9040.00.patch, HDFS-9040.001.wip.patch, HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> Proposal 1:
> A BlockGroupDataStreamer to communicate with NN to allocate/update block, and StripedDataStreamer
s only have to stream blocks to DNs.
> Proposal 2:
> See below the [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message