hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem Ervits <artemerv...@gmail.com>
Subject Re: Active Namenode keeps crashing
Date Wed, 12 Aug 2015 07:05:11 GMT
Check whether connectivity between servers is stable. Error says it can't
reach one node. Also check that time is synched between nodes.
On Aug 9, 2015 9:31 PM, "Suresh V" <verditer@gmail.com> wrote:

> In our HA setup, the active namenode keeps crashing once a week or so. The
> cluster is quite idle without many jobs running and not much user activity.
>
> Below is logs from journal nodes. Can someone help us with this please?
>
>
> 2015-08-04 13:00:20,054 INFO  server.Journal
> (Journal.java:updateLastPromisedEpoch(315)) - Updating lastPromisedEpoch
> from 9 to 10 for client /172.26.44.133
>
> 2015-08-04 13:00:20,175 INFO  server.Journal
> (Journal.java:scanStorageForLatestEdits(188)) - Scanning storage
> FileJournalManager(root=/hadoop/hdfs/journal/HDPPROD)
>
> 2015-08-04 13:00:20,220 INFO  server.Journal
> (Journal.java:scanStorageForLatestEdits(194)) - Latest log is
> EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false)
>
> 2015-08-04 13:00:20,891 INFO  server.Journal
> (Journal.java:getSegmentInfo(687)) - getSegmentInfo(523903):
> EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false)
> -> startTxId: 523903 endTxId: 523925 isInProgress: true
>
> 2015-08-04 13:00:20,891 INFO  server.Journal
> (Journal.java:prepareRecovery(731)) - Prepared recovery for segment 523903:
> segmentState { startTxId: 523903 endTxId: 523925 isInProgress: true }
> lastWriterEpoch: 9 lastCommittedTxId: 523924
>
> 2015-08-04 13:00:20,956 INFO  server.Journal
> (Journal.java:getSegmentInfo(687)) - getSegmentInfo(523903):
> EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false)
> -> startTxId: 523903 endTxId: 523925 isInProgress: true
>
> 2015-08-04 13:00:20,956 INFO  server.Journal
> (Journal.java:acceptRecovery(817)) - Skipping download of log startTxId:
> 523903 endTxId: 523925 isInProgress: true: already have up-to-date logs
>
> 2015-08-04 13:00:20,989 INFO  server.Journal
> (Journal.java:acceptRecovery(850)) - Accepted recovery for segment 523903:
> segmentState { startTxId: 523903 endTxId: 523925 isInProgress: true }
> acceptedInEpoch: 10
>
> 2015-08-04 13:00:21,791 INFO  server.Journal
> (Journal.java:finalizeLogSegment(584)) - Validating log segment
> /hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903
> about to be finalized
>
> 2015-08-04 13:00:21,805 INFO  namenode.FileJournalManager
> (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits file
> /hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903
> ->
> /hadoop/hdfs/journal/HDPPROD/current/edits_0000000000000523903-0000000000000523925
>
> 2015-08-04 13:00:22,257 INFO  server.Journal
> (Journal.java:startLogSegment(532)) - Updating lastWriterEpoch from 9 to 10
> for client /172.26.44.133
>
> 2015-08-04 13:00:23,699 INFO  ipc.Server (Server.java:run(2060)) - IPC
> Server handler 4 on 8485, call
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from
> 172.26.44.135:43678 Call#304302 Retry#0
>
> java.io.IOException: IPC's epoch 9 is less than the last promised epoch 10
>
>         at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)
>
>         at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442)
>
>         at
> org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:342)
>
>         at
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148)
>
>         at
> org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
>
>         at
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
>
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>
> 2015-08-06 19:13:14,012 INFO  httpclient.HttpMethodDirector
> (HttpMethodDirector.java:executeWithRetry(439)) - I/O exception
> (org.apache.commons.httpclient.NoHttpResponseException) caught when
> processing request: The server az-easthdpmnp02.metclouduseast.comfailed
> to respond
>
>
>
>
> Thank you
>
> Suresh.
>
>
>

Mime
View raw message