hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
Date Thu, 23 Jan 2014 19:48:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880279#comment-13880279

Jing Zhao commented on HDFS-5776:

# In DFSClient, I agree with Arpit that we should remove the allowHedgedReads field and the
enable/disable methods. In the current code, whether hedged read is enabled is determined
by the initial setting of the hedgedReadThreadPool. If we provide these extra enable/disable
methods, what if a user of DFSClient sets 0 to the thread pool size and later call the enableHedgedReads?
Unless we have a clear use case to support the usage of the enable/disable methods, I guess
we do not need to provide these flexibility here.
An alternative way to do this is to have an "Allow-Hedged-Reads" configuration, and if it
is set to true, we load the number of thread pool and the threshold time. We will provide
an isHedgedReadsEnabled method but we will not provide enable/disable methods. I guess this
may be easier for users to understand.
# Can this scenario be possible? In hedgedFetchBlockByteRange, if we hit the timeout for the
first DN, we will add the DN to the ignore list, and call chooseDataNode again. If the first
DN is the only DN we can read, we will get IOException from bestNode. Then we will run into
a loop where we keep trying to get another DN multiple times (some NN rpc call will even be
fired). And during this process the first DN can even return the data. In this scenario I
guess we may get a worse performance? Thus I guess we should not trigger hedged read if we
find that we cannot (easily) find the second DN for read?

> Support 'hedged' reads in DFSClient
> -----------------------------------
>                 Key: HDFS-5776
>                 URL: https://issues.apache.org/jira/browse/HDFS-5776
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt,
HDFS-5776-v6.txt, HDFS-5776.txt
> This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & "dfs.dfsclient.quorum.read.threadpool.size"
to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics,
we could export the interested metric valus into client system(e.g. HBase's regionserver metric).
> The core logic is in pread code path, we decide to goto the original fetchBlockByteRange
or the new introduced fetchBlockByteRangeSpeculative per the above config items.

This message was sent by Atlassian JIRA

View raw message