hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh V <verdi...@gmail.com>
Subject Active Namenode keeps crashing
Date Mon, 10 Aug 2015 01:12:46 GMT
In our HA setup, the active namenode keeps crashing once a week or so. The
cluster is quite idle without many jobs running and not much user activity.

Below is logs from journal nodes. Can someone help us with this please?


2015-08-04 13:00:20,054 INFO  server.Journal
(Journal.java:updateLastPromisedEpoch(315)) - Updating lastPromisedEpoch
from 9 to 10 for client /172.26.44.133

2015-08-04 13:00:20,175 INFO  server.Journal
(Journal.java:scanStorageForLatestEdits(188)) - Scanning storage
FileJournalManager(root=/hadoop/hdfs/journal/HDPPROD)

2015-08-04 13:00:20,220 INFO  server.Journal
(Journal.java:scanStorageForLatestEdits(194)) - Latest log is
EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false)

2015-08-04 13:00:20,891 INFO  server.Journal
(Journal.java:getSegmentInfo(687)) - getSegmentInfo(523903):
EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false)
-> startTxId: 523903 endTxId: 523925 isInProgress: true

2015-08-04 13:00:20,891 INFO  server.Journal
(Journal.java:prepareRecovery(731)) - Prepared recovery for segment 523903:
segmentState { startTxId: 523903 endTxId: 523925 isInProgress: true }
lastWriterEpoch: 9 lastCommittedTxId: 523924

2015-08-04 13:00:20,956 INFO  server.Journal
(Journal.java:getSegmentInfo(687)) - getSegmentInfo(523903):
EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false)
-> startTxId: 523903 endTxId: 523925 isInProgress: true

2015-08-04 13:00:20,956 INFO  server.Journal
(Journal.java:acceptRecovery(817)) - Skipping download of log startTxId:
523903 endTxId: 523925 isInProgress: true: already have up-to-date logs

2015-08-04 13:00:20,989 INFO  server.Journal
(Journal.java:acceptRecovery(850)) - Accepted recovery for segment 523903:
segmentState { startTxId: 523903 endTxId: 523925 isInProgress: true }
acceptedInEpoch: 10

2015-08-04 13:00:21,791 INFO  server.Journal
(Journal.java:finalizeLogSegment(584)) - Validating log segment
/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903
about to be finalized

2015-08-04 13:00:21,805 INFO  namenode.FileJournalManager
(FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits file
/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903
->
/hadoop/hdfs/journal/HDPPROD/current/edits_0000000000000523903-0000000000000523925

2015-08-04 13:00:22,257 INFO  server.Journal
(Journal.java:startLogSegment(532)) - Updating lastWriterEpoch from 9 to 10
for client /172.26.44.133

2015-08-04 13:00:23,699 INFO  ipc.Server (Server.java:run(2060)) - IPC
Server handler 4 on 8485, call
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from
172.26.44.135:43678 Call#304302 Retry#0

java.io.IOException: IPC's epoch 9 is less than the last promised epoch 10

        at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)

        at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442)

        at
org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:342)

        at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148)

        at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)

        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)

        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)

        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)

        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)

        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:415)

        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

2015-08-06 19:13:14,012 INFO  httpclient.HttpMethodDirector
(HttpMethodDirector.java:executeWithRetry(439)) - I/O exception
(org.apache.commons.httpclient.NoHttpResponseException) caught when
processing request: The server az-easthdpmnp02.metclouduseast.comfailed to
respond




Thank you

Suresh.

Mime
View raw message