hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11738) Hedged pread takes more time when block moved from initial locations
Date Tue, 15 Aug 2017 17:53:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127631#comment-16127631

Vinayakumar B commented on HDFS-11738:

Thanks [~jojochuang] for reviewing changes.
bq.  IIUC, the client would stuck in chooseDataNode() in such a scenario? 
Yeah, reader thread goes on retry until max retries, and gets {{BlockMissedException}}. But
since this is a hedged read, already read would have completed with actual host. So read will
completes successfully, but call will return to user only after all retries exhausted. Non-hedge
case, read would fail. It was fixed in HDFS-11708.

bq. The method chooseDataNode should add a @Nullable to indicate a null return value is valid.
I tried to add @Nullable, but my IDE started showing some javadoc error. So added the whole
javadoc mentioning about possible null return value. Hope that satisfies you.
bq. can be simplified as {{chosenNode = chooseDataNode(block, ignored, false);}}
Thats a good catch. changed.

{quote}The timeout of 30 seconds seems a little short. On my laptop this test takes approximately
20 seconds, so on a busy host the unit test might potentially run slightly over time. Or would
it be reasonable to reduce some wait time?
E.g. reduce dfs.client.retry.window.base from 3000 to 1000?{quote}
Yeah, increased the timeout to 60000 and reduced the window time to 1000 as well. Thank you
for the hint.

please check updated patch

> Hedged pread takes more time when block moved from initial locations
> --------------------------------------------------------------------
>                 Key: HDFS-11738
>                 URL: https://issues.apache.org/jira/browse/HDFS-11738
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>         Attachments: HDFS-11738-01.patch, HDFS-11738-02.patch, HDFS-11738-03.patch, HDFS-11738-04.patch
> Scenario : 
> Same as HDFS-11708.
> During Hedge read, 
> 1. First two locations fails to read the data in hedged mode.
> 2. chooseData refetches locations and adds a future to read from DN3.
> 3. after adding future to DN3, main thread goes for refetching locations in loop and
stucks there till all 3  retries to fetch locations exhausted, which consumes ~20 seconds
with exponential retry time.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message