hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Kambatla <ka...@cloudera.com>
Subject Re: HA Jobtracker failure
Date Mon, 27 Jan 2014 21:59:58 GMT
(Redirecting to cdh-user, moving user@hadoop to bcc).

Hi Oren

Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.

Thanks
Karthik




On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com> wrote:

> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy9.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>         at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>

Mime
View raw message