Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 988F96534 for ; Mon, 16 May 2011 21:09:08 +0000 (UTC) Received: (qmail 66838 invoked by uid 500); 16 May 2011 21:09:07 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 66805 invoked by uid 500); 16 May 2011 21:09:07 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 66797 invoked by uid 99); 16 May 2011 21:09:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2011 21:09:07 +0000 X-ASF-Spam-Status: No, hits=4.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_FILL_THIS_FORM_SHORT,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vrAMAnATHAN00@aol.com designates 205.188.105.144 as permitted sender) Received: from [205.188.105.144] (HELO imr-da02.mx.aol.com) (205.188.105.144) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 May 2011 21:08:58 +0000 Received: from imo-da04.mx.aol.com (imo-da04.mx.aol.com [205.188.169.202]) by imr-da02.mx.aol.com (8.14.1/8.14.1) with ESMTP id p4GL8U7K006668 for ; Mon, 16 May 2011 17:08:30 -0400 Received: from vrAMAnATHAN00@aol.com by imo-da04.mx.aol.com (mail_out_v42.9.) id 3.e46.149fa1f3 (56012) for ; Mon, 16 May 2011 17:08:25 -0400 (EDT) Received: from smtprly-de03.mx.aol.com (smtprly-de03.mx.aol.com [205.188.249.170]) by cia-md07.mx.aol.com (v129.10) with ESMTP id MAILCIAMD076-b2504dd19245b2; Mon, 16 May 2011 17:08:25 -0400 Received: from webmail-m093 (webmail-m093.sim.aol.com [64.12.102.43]) by smtprly-de03.mx.aol.com (v129.10) with ESMTP id MAILSMTPRLYDE035-b2504dd19245b2; Mon, 16 May 2011 17:08:21 -0400 References: <8CDDF2B8EEAF4C9-1B14-35A43@webmail-m071.sysops.aol.com> To: user@hbase.apache.org Subject: Re: mapreduce job failure Date: Mon, 16 May 2011 17:08:21 -0400 X-AOL-IP: 10.181.185.99 In-Reply-To: X-MB-Message-Source: WebUI MIME-Version: 1.0 From: Venkatesh X-MB-Message-Type: User Content-Type: multipart/alternative; boundary="--------MB_8CDE224854DB89D_838_279C_webmail-m093.sysops.aol.com" X-Mailer: AOL Webmail 33668-STANDARD Received: from 10.181.185.99 by webmail-m093.sysops.aol.com (64.12.102.43) with HTTP (WebMailUI); Mon, 16 May 2011 17:08:21 -0400 Message-Id: <8CDE2248546947A-838-17AE@webmail-m093.sysops.aol.com> X-AOL-SENDER: vrAMAnATHAN00@aol.com X-Old-Spam-Flag: NO ----------MB_8CDE224854DB89D_838_279C_webmail-m093.sysops.aol.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Thanks J-D Using hbase-0.20.6, 49 node cluster The map reduce job involve a full table scan...(region size 4 gig) The job runs great for 1 week.. Starts failing after 1 week of data accumulation (about 3000 regions) About 400 regions get created per day... Can you suggest any tunables at the HBase level. or HDFS level.? Also, I've one more issue..when region servers die..Errors below: (any sug= gestion here is helpfull as well) org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namen= ode.LeaseExpiredException: No lease on /hbase_data_one_110425/.../compacti= on.dir/249610074/4534752250560182124 File does not exist. Holder DFSClient= _-398073404 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(= FSNamesystem.java:1332) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(= FSNamesystem.java:1323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAddition= alBlock(FSNamesystem.java:1251) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNo= de.java:422) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho= dAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) =20 =20 =20 =20 -----Original Message----- From: Jean-Daniel Cryans To: user@hbase.apache.org Sent: Fri, May 13, 2011 12:39 am Subject: Re: mapreduce job failure All that means is that the task stayed in map() for 10 minutes, blocked on "something". If you were scanning an hbase table, and didn't get a new row after 1 minute, then the scanner would expire. That's orthogonal tho. You need to figure what you're blocking on, add logging and try to jstack your Child processes for example. J-D On Thu, May 12, 2011 at 7:21 PM, Venkatesh wrote: > > Hi > Using hbase-0.20.6 > > mapreduce job started failing in the map phase (using hbase table as inp= ut for=20 mapper)..(ran fine for a week or so starting with empty tables).. > > task tracker log: > > > Task attempt_201105121141_0002_m_000452_0 failed to report status for 60= 0=20 seconds. Killing > > > Region server log: > > 2011-05-12 18:27:39,919 INFO org.apache.hadoop.hbase.regionserver.HRegio= nServer:=20 Scanner -7857209327501974146 lease expired > > 2011-05-12 18:28:29,716 ERROR org.apache.hadoop.hbase.regionserver.HRegi= onServer:org.apache.hadoop.hbase.UnknownScannerException:=20 Name: -7857209327501974146 > at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegio= nServer.java:1880)=20 at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) = at=20 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java= :657) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServe= r.java:915) > > > 2011-05-12 18:28:29,897 INFO org.apache.hadoop.ipc.HBaseServer: IPC Serv= er=20 handler 3 on 60020, call next(-78572093275019 > 74146, 1) from .....:35202: error: org.apache.hadoop.hbase.UnknownScanne= rException:=20 Name: -7857209327501974146 > org.apache.hadoop.hbase.UnknownScannerException: Name: -7857209327501974= 146 > at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegio= nServer.java:1880) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth= odAccessorImpl.java:25)=20 at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java= :657) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServe= r.java:915) > > I don't see any error in datanodes > > Appreciate any help > thanks > v > > > =20 ----------MB_8CDE224854DB89D_838_279C_webmail-m093.sysops.aol.com--