hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Rose (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
Date Tue, 05 Jul 2016 21:46:10 GMT
Michael Rose created HDFS-10597:
-----------------------------------

             Summary: DFSClient hangs if using hedged reads and all but one eligible replica
is down 
                 Key: HDFS-10597
                 URL: https://issues.apache.org/jira/browse/HDFS-10597
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.7.0, 2.6.0
            Reporter: Michael Rose


If hedged reads are enabled, even if there is only a single datanode available, the hedged
read loop will respect the ignored nodes list and never send more than one request, but retry
for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never removed from
in the scope of a single request, therefore a single failed read fails the entire request
*or* delays responses.

There's actually a secondary undesirable behavior here too. To set the stage, lets say 10ms
is the hedged read timeout and we only have a single replica available. If a hedged read can't
find a datanode, it will delay a successful response considerably.

1. [0ms] `DFSInputStream#hedgedFetchBlockByteRange` First (not-hedged) read is sent to DN1,
read takes 50ms to succeed. ignoredNodes=[DN1]
2. [+10ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query
the NameNode for block locations and sleep, trying again.
3. [+3000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query
the NameNode for block locations and sleep, trying again.
3. [+3000+6000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1,
we re-query the NameNode for block locations and sleep, trying again.
4. [+6000ms+9000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1,
we re-query the NameNode for block locations and sleep, trying again.
5. [27010ms] Control flow restored to `DFSInputStream#hedgedFetchBlockByteRange`, completion
service is polled and read that succeeded at [50ms] returned successfully, except +27000ms
extra (worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read eventually fails,
the DFSClient will still retry inside of `DFSInputStream#hedgedFetchBlockByteRange` for the
same retries before failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

`DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in the ignored
list before allowing it to be returned. Amending this check to short-circuit if there's only
a single available node avoids the regrettably useless retries, that is:

`nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])`

However, with this change, if there's only one DN available, it'll send the hedged request
to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting
work into the hedge pool so that successful, fast reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping
a single datanode leads to the cluster coming to a grinding halt.

You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster
to have a single datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message