hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Tiwari <siddharth.tiw...@live.com>
Subject Re: HA Jobtracker failure
Date Mon, 27 Jan 2014 22:04:12 GMT
How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented
it using. Zkfc, it would be interesting to look in zookeeper logs as well. 

Sent from my iPhone

> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <kasha@cloudera.com> wrote:
> 
> (Redirecting to cdh-user, moving user@hadoop to bcc).
> 
> Hi Oren
> 
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this
is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using
jstat -gcutil <jt-pid>.
> 
> Thanks
> Karthik
> 
> 
> 
> 
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby
server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>> 
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing
job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to
send params to server
>> java.lang.InterruptedException
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy9.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>>         at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>> 
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector:
Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector:
Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured
with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts
(include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder,
call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on
8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder,
call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on
8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 
>> thanks
>> Oren
> 

Mime
View raw message