hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shaik M <munna.had...@gmail.com>
Subject NameNode Crashing with "flush failed for required journal" exception
Date Thu, 28 Apr 2016 11:32:44 GMT
Hi All,

I am running 8 node HDP 2.3 Hadoop Cluster (3 Master+5 DataNodes) with
Kerberos security.

NameNode having  HA and it is crashing at least once in a day with "flush
failed for required journal " exception. don't have any network issues
between the nodes.

I have tried to find the causing the issue,  but, i couldn't able to found
proper resolution. Please help me to fix this issue.

Thank you,
Shaik

2016-04-28 05:05:23,159 WARN  client.QuorumJournalManager
(QuorumCall.java:waitFor(134)) - Waited 18015 ms (timeout=20000 ms) for a
response for sendEdits. Succeeded so far: [10.192.149.194:8485]
2016-04-28 05:05:23,483 INFO  BlockStateChange
(BlockManager.java:computeReplicationWorkForBlocks(1522)) - BLOCK*
neededReplications = 0, pendingReplications = 0.
2016-04-28 05:05:24,160 WARN  client.QuorumJournalManager
(QuorumCall.java:waitFor(134)) - Waited 19016 ms (timeout=20000 ms) for a
response for sendEdits. Succeeded so far: [10.192.149.194:8485]
2016-04-28 05:05:25,145 FATAL namenode.FSEditLog
(JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for
required journal (JournalAndStream(mgr=QJM to [10.192.149.187:8485,
10.192.149.195:8485, 10.192.149.194:8485], stream=QuorumOutputStream
starting at txid 26198626))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to
respond.
        at
org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
        at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
        at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
        at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
        at
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at
org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
        at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3492)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:787)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:536)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
2016-04-28 05:05:25,147 WARN  client.QuorumJournalManager
(QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting
at txid 26198626
2016-04-28 05:05:25,150 INFO  util.ExitUtil (ExitUtil.java:terminate(124))
- Exiting with status 1
2016-04-28 05:05:25,160 INFO  namenode.NameNode (LogAdapter.java:info(47))
- SHUTDOWN_MSG:

Mime
View raw message