hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Rose (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
Date Tue, 05 Jul 2016 22:00:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Rose updated HDFS-10597:
--------------------------------
    Affects Version/s: 2.4.0
                       2.5.0

> DFSClient hangs if using hedged reads and all but one eligible replica is down 
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-10597
>                 URL: https://issues.apache.org/jira/browse/HDFS-10597
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0
>            Reporter: Michael Rose
>
> If hedged reads are enabled, even if there is only a single datanode available, the hedged
read loop will respect the ignored nodes list and never send more than one request, but retry
for quite some time choosing a datanode.
> This is unfortunate, as the ignored nodes list is only ever added to and never removed
from in the scope of a single request, therefore a single failed read fails the entire request
*or* delays responses.
> There's actually a secondary undesirable behavior here too. If a hedged read can't find
a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms
is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1].

> 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent
to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1]
> 2. [10ms] Poll timeout. Send hedged request
> 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request.
As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for
block locations and sleep, trying again.
> 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1,
we re-query the NameNode for block locations and sleep, trying again.
> 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes
DN1, we re-query the NameNode for block locations and sleep, trying again.
> 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes
DN1, we re-query the NameNode for block locations and sleep, trying again.
> 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion
service is polled and read that succeeded at [50ms] returned successfully, except +27000ms
extra (worst case, expected value would be half).
> This is only one scenario (a happy scenario). Supposing that the first read eventually
fails, the DFSClient will still retry inside of {{DFSInputStream#hedgedFetchBlockByteRange}}
for the same retries before failing.
> I've identified one way to fix the behavior, but I'd be interested in thoughts:
> {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the
ignored list before allowing it to be returned. Amending this check to short-circuit if there's
only a single available node avoids the regrettably useless retries, that is:
> {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}}
> However, with this change, if there's only one DN available, it'll send the hedged request
to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting
work into the hedge pool so that successful, fast reads aren't blocked by this issue.
> In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping
a single datanode leads to the cluster coming to a grinding halt.
> You can observe this behavior yourself by editing {{TestPread#testMaxOutHedgedReadPool}}'s
MiniDFSCluster to have a single datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message