hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3027) JobTracker shuts down during initialization if the NameNode is down
Date Mon, 24 Mar 2008 12:31:24 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amareshwari Sriramadasu updated HADOOP-3027:
--------------------------------------------

    Attachment: patch-3027.txt

In FileSystem.Cache.get, if fs is null and cache is empty, a shutdown hook is added to close
all filesystem. 

When Namenode is down and Jobtracker wants to connect, Filesystem cache being empty,  the
shutdown hook was added during the first trial. Since NameNode is down, createFileSystem fails.
When the jobtracker tries again, fs is null and cache is still empty; so, it wants to add
shutdown hook again . Thus there is IllegalArgumentException saying Hook previously registered.
The solution could be add addShutdown hook once createFileSystem succeeds. Here is a patch
doing the same.

> JobTracker shuts down during initialization if the NameNode is down
> -------------------------------------------------------------------
>
>                 Key: HADOOP-3027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3027
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, mapred
>    Affects Versions: 0.16.0
>            Reporter: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: patch-3027.txt
>
>
> When the JobTracker is initializing and trying to connect to the NameNode, it shuts itself
down if the NameNode is unreachable for more than one iteration of the connect loop. It can
be easily reproduced if the JobTracker is started before the NameNode is started. The JobTracker
will shut itself down in a few seconds. The problem seems to be with adding a shutdown hook
in the FileSystem in the case where the same hook has been added before.
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9101
> 2008-03-17 09:45:20,979 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver:
50030
> 2008-03-17 09:45:21,374 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 1 time(s).
> 2008-03-17 09:45:22,377 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 2 time(s).
> 2008-03-17 09:45:23,380 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 3 time(s).
> 2008-03-17 09:45:24,383 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 4 time(s).
> 2008-03-17 09:45:25,385 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 5 time(s).
> 2008-03-17 09:45:26,388 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 6 time(s).
> 2008-03-17 09:45:27,391 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 7 time(s).
> 2008-03-17 09:45:28,394 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 8 time(s).
> 2008-03-17 09:45:29,397 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 9 time(s).
> 2008-03-17 09:45:30,402 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 10 time(s).
> 2008-03-17 09:45:31,406 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system
directory: /tmp/hadoop/mapred/system
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
> 	at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:546)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:211)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:312)
> 	at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:94)
> 	at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:158)
> 	at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:69)
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1255)
> 	at org.apache.hadoop.fs.FileSystem.access$400(FileSystem.java:53)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1272)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,410 FATAL org.apache.hadoop.mapred.JobTracker: java.lang.IllegalArgumentException:
Hook previously registered
> 	at java.lang.ApplicationShutdownHooks.add(Unknown Source)
> 	at java.lang.Runtime.addShutdownHook(Unknown Source)
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1269)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:96)
> 	at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:702)
> 	at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:135)
> 	at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2266)
> 2008-03-17 09:45:41,412 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message