hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
Date Tue, 28 Jan 2014 02:46:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883685#comment-13883685

Arpit Agarwal commented on HDFS-5776:

Yes, that would be perfect sometimes, but not works for HBase scenario(the above Stack's consideration
is great), since we made the pool "static", and per client view, it's more flexible if we
provide instance level disable/enable APIs, so we can archive to use the hbase shell script
to control the switch per dfs client instance, that'll be cooler

In actualGetFromOneDatanode(), the refetchToken/refetchEncryptionKey is initialized outside
the while (true) loop (see Line 993-996), when we hit InvalidEncryptionKeyException/InvalidBlockTokenException,
the refetchToken and refetchEncryptionKey will be decreased by 1, (see refetchEncryptionKey--
and refetchToken-- statement), if the exceptions happened again, the check conditions will
be failed definitely(see "e instanceof InvalidEncryptionKeyException && refetchEncryptionKey
> 0" and "refetchToken > 0"), so go to the else clause, that'll execute:
Isn't the call to {{actualGetFromOneDataNode}} wrapped in a loop itself? I am talking about
the while loop in {{fetchBlockByteRange}}. Will that not change the behavior? Maybe it is
harmless, I am not sure. I just want us to be clear either way.

Thanks for adding the thread count limit. If we need more than 128 threads per client process
just for backup reads we (hdfs) need to think about proper async rpc. Suggesting a lack of
limits ignores the point that it can double the DN load on an already loaded cluster. Also
1ms lower bound for the delay is as good as zero but as long as we have a thread count limit
I am okay.

Minor points that don't need to hold up the checkin:
# The test looks like a stress test, i.e. we are hoping that some of the hedged requests will
complete before the primary requests. We can create a separate Jira to write a deterministic
unit test and it’s fine if someone else picks that up later.
# A couple of points from my initial feedback (#10, #12) were missed but again not worth holding
the checkin.

Other than clarifying the loop behavior the v9 patch looks fine to me.

Thanks again for working with the feedback Liang, this is a nice capability to have in HDFS.

> Support 'hedged' reads in DFSClient
> -----------------------------------
>                 Key: HDFS-5776
>                 URL: https://issues.apache.org/jira/browse/HDFS-5776
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt,
HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt,
> This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & "dfs.dfsclient.quorum.read.threadpool.size"
to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics,
we could export the interested metric valus into client system(e.g. HBase's regionserver metric).
> The core logic is in pread code path, we decide to goto the original fetchBlockByteRange
or the new introduced fetchBlockByteRangeSpeculative per the above config items.

This message was sent by Atlassian JIRA

View raw message