hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oren Marmor <or...@infolinks.com>
Subject HA Jobtracker failure
Date Thu, 23 Jan 2014 16:11:41 GMT
Hi.
We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
We had a problem during which the active node suddenly became standby and
the standby server attempted to start resulting in a java heap space
failure.
any ideas to why the active node turned to standby?

logs attached:
on (original) active node:
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5858
*2014-01-22 06:50:27,386 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
standby*
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send params to server
java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
        at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
        at org.apache.hadoop.ipc.Client.call(Client.java:1198)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy9.getFileInfo(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
        at
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50031

on standby node
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
active
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
aykey/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler
configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
limitMaxMemForMapTasks, limitMaxMemF
orReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread, tokenRemoverScanI
nterval=60 min(s)
...
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
RUNNING
2014-01-22 06:52:06,560 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
from ****:32931: output error
2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8023 caught an exception
java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
        at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
        at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
        at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
        at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
from ****:60965: output error
2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8023 caught an exception
java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
        at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
        at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
        at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
        at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)

thanks
Oren

Mime
View raw message