hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13767) Aliyun Connection broken when idle then 1 minutes or build than 3 hours
Date Fri, 28 Oct 2016 11:50:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615177#comment-15615177

Steve Loughran commented on HADOOP-13767:

* Is there a stack trace here? 
* what was the operation? GET, PUT?

If you look at s3a we treat a failure on a read as something to retry, just re-open as if
it was a seek(). Same for a PUT, where, although the AWS library does some retries, it doesn't
redo them all. I went for a hard coded retry policy class for backoff retry (see {{S3ABlockOutputStream}}:

RetryPolicy retryPolicy =

Hard code to avoid more config options to document and test, while setting things up to make
it configurable in future.

Consider also adding a counter of disconnect/reconnects to metrics, as it could be a sign
of connectivity issues. And when fielding support calls "why is IO slow", an answer like "your
network is unreliable" is a good answer

> Aliyun Connection broken when idle then 1 minutes or build than 3 hours
> -----------------------------------------------------------------------
>                 Key: HADOOP-13767
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13767
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Genmao Yu
>            Assignee: Genmao Yu
>             Fix For: 3.0.0-alpha2

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message