Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 979341838A for ; Mon, 10 Aug 2015 01:31:13 +0000 (UTC) Received: (qmail 96596 invoked by uid 500); 10 Aug 2015 01:31:08 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 96474 invoked by uid 500); 10 Aug 2015 01:31:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 96464 invoked by uid 99); 10 Aug 2015 01:31:07 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Aug 2015 01:31:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id ECF0B1A157C for ; Mon, 10 Aug 2015 01:31:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.793 X-Spam-Level: *** X-Spam-Status: No, score=3.793 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, KAM_BADIPHTTP=2, RCVD_IN_MSPIKE_H2=-1.108, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CEWc46y7hRsq for ; Mon, 10 Aug 2015 01:30:59 +0000 (UTC) Received: from mail-io0-f169.google.com (mail-io0-f169.google.com [209.85.223.169]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 4F3E434A8F for ; Mon, 10 Aug 2015 01:14:23 +0000 (UTC) Received: by ioii16 with SMTP id i16so155804722ioi.0 for ; Sun, 09 Aug 2015 18:12:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Ep59/ox7sBSMxXG3GkUivj0xBvl9a4RQF+kh2vOWmPI=; b=RC/Eg1ETgm227sAs/WlrePfQehpEGg3NlYnvV7cbA4BXodddvWUYGIjazU/h3bJWIw uWEUjJtrEtVAtoLV53RxbcvARENx59ZpKQB4gLlHFnzMWZ9s8IUt+osm0w/xXoOPAt+O qHG3zTJWQ+SqMdQQs+YZY1gPjo9QyRQySWd5iZlFIp33MI3Yt1NQ4D11JXe5usQqaDCs ApvdQTFclDzRUDmV0W/hlSZn7/8tz8tjRKnubZaJH0qjxz7EJvr06Vhzyx0P8dWTuEE1 D/FKYwy7Agpe92GMsdI+WYreAmVsRP7lpQ3Oy3+OdFPdf4ygp0oL1YG4zVvHPjwD8Oue A5QQ== MIME-Version: 1.0 X-Received: by 10.107.31.134 with SMTP id f128mr17364875iof.19.1439169166130; Sun, 09 Aug 2015 18:12:46 -0700 (PDT) Received: by 10.107.137.205 with HTTP; Sun, 9 Aug 2015 18:12:46 -0700 (PDT) Date: Sun, 9 Aug 2015 20:12:46 -0500 Message-ID: Subject: Active Namenode keeps crashing From: Suresh V To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11403efaeb44e1051ceaae16 --001a11403efaeb44e1051ceaae16 Content-Type: text/plain; charset=UTF-8 In our HA setup, the active namenode keeps crashing once a week or so. The cluster is quite idle without many jobs running and not much user activity. Below is logs from journal nodes. Can someone help us with this please? 2015-08-04 13:00:20,054 INFO server.Journal (Journal.java:updateLastPromisedEpoch(315)) - Updating lastPromisedEpoch from 9 to 10 for client /172.26.44.133 2015-08-04 13:00:20,175 INFO server.Journal (Journal.java:scanStorageForLatestEdits(188)) - Scanning storage FileJournalManager(root=/hadoop/hdfs/journal/HDPPROD) 2015-08-04 13:00:20,220 INFO server.Journal (Journal.java:scanStorageForLatestEdits(194)) - Latest log is EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false) 2015-08-04 13:00:20,891 INFO server.Journal (Journal.java:getSegmentInfo(687)) - getSegmentInfo(523903): EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false) -> startTxId: 523903 endTxId: 523925 isInProgress: true 2015-08-04 13:00:20,891 INFO server.Journal (Journal.java:prepareRecovery(731)) - Prepared recovery for segment 523903: segmentState { startTxId: 523903 endTxId: 523925 isInProgress: true } lastWriterEpoch: 9 lastCommittedTxId: 523924 2015-08-04 13:00:20,956 INFO server.Journal (Journal.java:getSegmentInfo(687)) - getSegmentInfo(523903): EditLogFile(file=/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903,first=0000000000000523903,last=0000000000000523925,inProgress=true,hasCorruptHeader=false) -> startTxId: 523903 endTxId: 523925 isInProgress: true 2015-08-04 13:00:20,956 INFO server.Journal (Journal.java:acceptRecovery(817)) - Skipping download of log startTxId: 523903 endTxId: 523925 isInProgress: true: already have up-to-date logs 2015-08-04 13:00:20,989 INFO server.Journal (Journal.java:acceptRecovery(850)) - Accepted recovery for segment 523903: segmentState { startTxId: 523903 endTxId: 523925 isInProgress: true } acceptedInEpoch: 10 2015-08-04 13:00:21,791 INFO server.Journal (Journal.java:finalizeLogSegment(584)) - Validating log segment /hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903 about to be finalized 2015-08-04 13:00:21,805 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(133)) - Finalizing edits file /hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_0000000000000523903 -> /hadoop/hdfs/journal/HDPPROD/current/edits_0000000000000523903-0000000000000523925 2015-08-04 13:00:22,257 INFO server.Journal (Journal.java:startLogSegment(532)) - Updating lastWriterEpoch from 9 to 10 for client /172.26.44.133 2015-08-04 13:00:23,699 INFO ipc.Server (Server.java:run(2060)) - IPC Server handler 4 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 172.26.44.135:43678 Call#304302 Retry#0 java.io.IOException: IPC's epoch 9 is less than the last promised epoch 10 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414) at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:342) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 2015-08-06 19:13:14,012 INFO httpclient.HttpMethodDirector (HttpMethodDirector.java:executeWithRetry(439)) - I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server az-easthdpmnp02.metclouduseast.comfailed to respond Thank you Suresh. --001a11403efaeb44e1051ceaae16 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
In our HA set= up, the active namenode keeps crashing once a week or so. The cluster is qu= ite idle without many jobs running and not much user activity.

Below is logs from journal nodes. Can someone help us with= this please?
=

2015-08-04 13:00:20,054 INFO=C2=A0 server.Journal (Journal.java:updat= eLastPromisedEpoch(315)) - Updating lastPromisedEpoch from 9 to 10 for clie= nt /172.26.44.133

2015-08-04 13:0= 0:20,175 INFO=C2=A0 server.Journal (Journal.java:scanStorageForLatestEdits(= 188)) - Scanning storage FileJournalManager(root=3D/hadoop/hdfs/journal/HDP= PROD)

2015-08-= 04 13:00:20,220 INFO=C2=A0 server.Journal (Journal.java:scanStorageForLates= tEdits(194)) - Latest log is EditLogFile(file=3D/hadoop/hdfs/journal/HDPPRO= D/current/edits_inprogress_0000000000000523903,first=3D0000000000000523903,= last=3D0000000000000523925,inProgress=3Dtrue,hasCorruptHeader=3Dfalse)

2015-08-04 13:00:2= 0,891 INFO=C2=A0 server.Journal (Journal.java:getSegmentInfo(687)) - getSeg= mentInfo(523903): EditLogFile(file=3D/hadoop/hdfs/journal/HDPPROD/current/e= dits_inprogress_0000000000000523903,first=3D0000000000000523903,last=3D0000= 000000000523925,inProgress=3Dtrue,hasCorruptHeader=3Dfalse) -> startTxId= : 523903 endTxId: 523925 isInProgress: true

2015-08-04 13:00:20,891 INFO=C2=A0 server.Jou= rnal (Journal.java:prepareRecovery(731)) - Prepared recovery for segment 52= 3903: segmentState { startTxId: 523903 endTxId: 523925 isInProgress: true }= lastWriterEpoch: 9 lastCommittedTxId: 523924

2015-08-04 13:00:20,956 INFO=C2=A0 server.J= ournal (Journal.java:getSegmentInfo(687)) - getSegmentInfo(523903): EditLog= File(file=3D/hadoop/hdfs/journal/HDPPROD/current/edits_inprogress_000000000= 0000523903,first=3D0000000000000523903,last=3D0000000000000523925,inProgres= s=3Dtrue,hasCorruptHeader=3Dfalse) -> startTxId: 523903 endTxId: 523925 = isInProgress: true

2015-08-04 13:00:20,956 INFO=C2=A0 server.Journal (Journal.java:accept= Recovery(817)) - Skipping download of log startTxId: 523903 endTxId: 523925= isInProgress: true: already have up-to-date logs

2015-08-04 13:00:20,989 INFO=C2=A0 serv= er.Journal (Journal.java:acceptRecovery(850)) - Accepted recovery for segme= nt 523903: segmentState { startTxId: 523903 endTxId: 523925 isInProgress: t= rue } acceptedInEpoch: 10

2015-08-04 13:00:21,791 INFO=C2=A0 server.Journal (Journal.java= :finalizeLogSegment(584)) - Validating log segment /hadoop/hdfs/journal/HDP= PROD/current/edits_inprogress_0000000000000523903 about to be finalized

2015-08-04 13:00:= 21,805 INFO=C2=A0 namenode.FileJournalManager (FileJournalManager.java:fina= lizeLogSegment(133)) - Finalizing edits file /hadoop/hdfs/journal/HDPPROD/c= urrent/edits_inprogress_0000000000000523903 -> /hadoop/hdfs/journal/HDPP= ROD/current/edits_0000000000000523903-0000000000000523925

2015-08-04 13:00:22,257 INFO=C2= =A0 server.Journal (Journal.java:startLogSegment(532)) - Updating lastWrite= rEpoch from 9 to 10 for client /172.26.44.133

2015-08-04 13:00:23,699 INFO=C2=A0 ipc.Server (Server.java:run= (2060)) - IPC Server handler 4 on 8485, call org.apache.hadoop.hdfs.qjourna= l.protocol.QJournalProtocol.journal from=C2=A0172.26.44.135:43678=C2=A0Call#304302 Retry= #0

java.io.IOE= xception: IPC's epoch 9 is less than the last promised epoch 10<= /p>

=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.qjournal.server.Journal.che= ckRequest(Journal.java:414)

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hado= op.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442)<= /p>

=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.qjournal.server.Journal.jou= rnal(Journal.java:342)

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hd= fs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:1= 48)

=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.hdfs.qjournal.protoc= olPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerS= ideTranslatorPB.java:158)

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop= .hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.ca= llBlockingMethod(QJournalProtocolProtos.java:25421)

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker= .call(ProtobufRpcEngine.java:619)

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apach= e.hadoop.ipc.RPC$Server.call(RPC.java:962)

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at = org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)

=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2= 035)

=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.security.AccessController.doPrivi= leged(Native Method)

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at javax.security.auth.Su= bject.doAs(Subject.java:415)

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.h= adoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.hadoop.ipc.Server$Handler.run(Ser= ver.java:2033)

2015-08-06 19:13:14,012 INFO=C2=A0 httpclient.HttpMethodDirector (HttpMeth= odDirector.java:executeWithRetry(439)) - I/O exception (org.apache.commons.= httpclient.NoHttpResponseException) caught when processing request: The ser= ver=C2=A0az-easthdpmnp02.metclouduseast.comfailed to respond

=C2=A0


Thank you

Suresh.


=
--001a11403efaeb44e1051ceaae16--