hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4659) Root cause of connection failure is being lost to code that uses it for delaying startup
Date Fri, 14 Nov 2008 15:20:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647627#action_12647627
] 

Steve Loughran commented on HADOOP-4659:
----------------------------------------

The problem could be - I repeat could be- from HADOOP-2188, though I'm not sure. There have
been too many changes to roll back, and its easier to go forwards. 

I have a patch that (correctly) puts the task tracker back to retrying
[sf-startdaemon-debug] 08/11/14 15:06:43 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 5 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:43 [Thread-41] INFO datanode.DataNode : BlockReport
of 0 blocks got processed in 1 msecs
[sf-startdaemon-debug] 08/11/14 15:06:44 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 6 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:45 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 7 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:46 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 8 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:47 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 9 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:47 [TaskTracker] INFO ipc.RPC : Server at localhost/127.0.0.1:8012
not available yet, Zzzzz...
[sf-startdaemon-debug] 08/11/14 15:06:49 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 0 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:50 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 1 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:51 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 2 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:52 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 3 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:53 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 4 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:53 [Thread-41] INFO datanode.DataNode : BlockReport
of 0 blocks got processed in 1 msecs
[sf-startdaemon-debug] 08/11/14 15:06:54 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 5 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:55 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 6 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:56 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 7 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:57 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 8 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:58 [TaskTracker] INFO ipc.Client : Retrying connect
to server: localhost/127.0.0.1:8012. Already tried 9 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:58 [TaskTracker] INFO ipc.RPC : Server at localhost/127.0.0.1:8012
not available yet, Zzzzz...


> Root cause of connection failure is being lost to code that uses it for delaying startup
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4659
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4659
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> ipc.Client the root cause of a connection failure is being lost as the exception is wrapped,
hence the outside code, the one that looks for that root cause, isn't working as expected.
The results is you can't bring up a task tracker before job tracker, and probably the same
for a datanode before a  namenode. The change that triggered this is not yet located, I had
thought it was HADOOP-3844 but I no longer believe this is the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message