hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gagan Brahmi <gaganbra...@gmail.com>
Subject Re: NameNode Crashing with "flush failed for required journal" exception
Date Thu, 28 Apr 2016 14:32:03 GMT
Hi Shaik,

The error basically indicates that namenode crashed waiting for the
write and sync to happen on the quorum of JournalNodes. In your case
atleast 2 journal nodes should complete the write and sync without the
timeout period of 20 seconds which does not seems to be the case.

I will advice you to verify the journal node logs and you should find
something interesting on them. Maybe some reasons for failures to
complete the write and sync operation on journal nodes.


Regards,
Gagan Brahmi

On Thu, Apr 28, 2016 at 4:32 AM, Shaik M <munna.hadoop@gmail.com> wrote:
> Hi All,
>
> I am running 8 node HDP 2.3 Hadoop Cluster (3 Master+5 DataNodes) with
> Kerberos security.
>
> NameNode having  HA and it is crashing at least once in a day with "flush
> failed for required journal " exception. don't have any network issues
> between the nodes.
>
> I have tried to find the causing the issue,  but, i couldn't able to found
> proper resolution. Please help me to fix this issue.
>
> Thank you,
> Shaik
>
> 2016-04-28 05:05:23,159 WARN  client.QuorumJournalManager
> (QuorumCall.java:waitFor(134)) - Waited 18015 ms (timeout=20000 ms) for a
> response for sendEdits. Succeeded so far: [10.192.149.194:8485]
> 2016-04-28 05:05:23,483 INFO  BlockStateChange
> (BlockManager.java:computeReplicationWorkForBlocks(1522)) - BLOCK*
> neededReplications = 0, pendingReplications = 0.
> 2016-04-28 05:05:24,160 WARN  client.QuorumJournalManager
> (QuorumCall.java:waitFor(134)) - Waited 19016 ms (timeout=20000 ms) for a
> response for sendEdits. Succeeded so far: [10.192.149.194:8485]
> 2016-04-28 05:05:25,145 FATAL namenode.FSEditLog
> (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for
> required journal (JournalAndStream(mgr=QJM to [10.192.149.187:8485,
> 10.192.149.195:8485, 10.192.149.194:8485], stream=QuorumOutputStream
> starting at txid 26198626))
> java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to
> respond.
>         at
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
>         at
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
>         at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
>         at
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
>         at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
>         at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
>         at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
>         at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3492)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:787)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:536)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131)
> 2016-04-28 05:05:25,147 WARN  client.QuorumJournalManager
> (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting
> at txid 26198626
> 2016-04-28 05:05:25,150 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) -
> Exiting with status 1
> 2016-04-28 05:05:25,160 INFO  namenode.NameNode (LogAdapter.java:info(47)) -
> SHUTDOWN_MSG:
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message