hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4659) Root cause of connection failure is being lost to code that uses it for delaying startup
Date Tue, 18 Nov 2008 20:09:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648734#action_12648734
] 

Hairong Kuang commented on HADOOP-4659:
---------------------------------------

On the second thought, throwing IOException from setupIOstreams is not a complete solution.
While one RPC is in the middle of setupIOstreams, there might be a different call that uses
the same connection and is about to setupIOstreams. If the first setup gets a ConnectException,
the second call will end up seeing a closed connection and ConnectException gets delivered
to call in call.error. This means that ConnectException will be wrapped.

I am thinking to solve the problem using my initial solution: check root cause in waitForProxy.

As for testing, I like Steve's idea. How about adding an API wait for proxy with a timeout
to RPC. The current API waitForProxy could use a timeout with the max long value.

> Root cause of connection failure is being lost to code that uses it for delaying startup
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4659
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.18.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: connectRetry.patch, hadoop-4659.patch
>
>
> ipc.Client the root cause of a connection failure is being lost as the exception is wrapped,
hence the outside code, the one that looks for that root cause, isn't working as expected.
The results is you can't bring up a task tracker before job tracker, and probably the same
for a datanode before a  namenode. The change that triggered this is not yet located, I had
thought it was HADOOP-3844 but I no longer believe this is the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message