hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7086) Retrying socket connection failure times can be made as configurable
Date Tue, 31 Jan 2012 15:33:10 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196981#comment-13196981
] 

Robert Joseph Evans commented on HADOOP-7086:
---------------------------------------------

I am not an expert on the IPC code, but the change looks fairly simple and very reasonable.
 I know I have been bitten by the 45 retries several times, and it is painful to wait that
long in an automated test before it finally fails.  Why is the timeout determined by the server
and not the client? At least that is what I am gleaning from the code, like I said I am not
an expert on this so I could be wrong about this.  Also this looks like we are changing the
protocol between the client and the server.  Do we have to mark this as an incompatible change,
and bump the protocol version number?
                
> Retrying socket connection failure times can be made as configurable
> --------------------------------------------------------------------
>
>                 Key: HADOOP-7086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7086
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>         Environment: NA
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Minor
>             Fix For: 0.24.0
>
>         Attachments: HADOOP-7086-1.patch, HADOOP-7086.patch, common-3899.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Retrying socket connection failure times are hard coded as 45 and it is giving the retryring
message for 45 times as below. 
> 2011-01-04 15:14:30,700 INFO ipc.Client (Client.java:handleConnectionFailure(487)) -
Retrying connect to server: /10.18.52.124:50020. Already tried 1 time(s).
> This can be made as configurable and also we can keep the default value as 45. If the
user wants to decrease/increase,  they can add this configurable property otherwise it can
continue with the default value.
> common\src\java\org\apache\hadoop\ipc\Client.java:
> -----------------------------------------------------------------------
> private synchronized void setupConnection() throws IOException {
>       short ioFailures = 0;
>       short timeoutFailures = 0;
>       while (true) {
>         try {
>           this.socket = socketFactory.createSocket();
>           this.socket.setTcpNoDelay(tcpNoDelay);
>           // connection time out is 20s
>           NetUtils.connect(this.socket, remoteId.getAddress(), 20000);
>           if (rpcTimeout > 0) {
>             pingInterval = rpcTimeout;  // rpcTimeout overwrites pingInterval
>           }
>           this.socket.setSoTimeout(pingInterval);
>           return;
>         } catch (SocketTimeoutException toe) {
>           /*
>            * The max number of retries is 45, which amounts to 20s*45 = 15
>            * minutes retries.
>            */
>           handleConnectionFailure(timeoutFailures++, 45, toe);
>         } catch (IOException ie) {
>           handleConnectionFailure(ioFailures++, maxRetries, ie);
>         }
>       }
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message