Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F770D2F9 for ; Wed, 11 Jul 2012 23:22:50 +0000 (UTC) Received: (qmail 43532 invoked by uid 500); 11 Jul 2012 23:22:48 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 43490 invoked by uid 500); 11 Jul 2012 23:22:48 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 43482 invoked by uid 99); 11 Jul 2012 23:22:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jul 2012 23:22:48 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of svarma.ng@gmail.com designates 209.85.214.41 as permitted sender) Received: from [209.85.214.41] (HELO mail-bk0-f41.google.com) (209.85.214.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jul 2012 23:22:42 +0000 Received: by bkcjc3 with SMTP id jc3so1678895bkc.14 for ; Wed, 11 Jul 2012 16:22:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=TSupgJwLf816L5b75U/2sJuJTAA8ztJOhRbtG9KCYEQ=; b=X34L3VmSCWT73CvL2ZcnKm9f69YNOLR1mOtV9yQf6COJo+OcFsGx0zCEVzfc6m3zOu dlD3yMuCAEEC3Z5tjhPHpi8CuAO/VNWesuiWWJ3FhZAWVDKowg6M3YGUY3B5d1gP439g fBriybBUgjuKbXh15wvL9P20lR3uaAVUpOATKBlboe6s06GdLT7IgaavVjzNOSTJfutc jPzNXdhnP6f4xF2zJNyYUqLRkanLral4q79f+71Y5NpSkCwQTLFmxguxysK7Sygjfamm 7TUQTRK7fhNf+nUypHmcdnmATLvh7VN3qSnXsi71jd2IX+gxdyrElRjGoogwVZBtnxGV utMQ== MIME-Version: 1.0 Received: by 10.205.120.18 with SMTP id fw18mr25472074bkc.64.1342048941493; Wed, 11 Jul 2012 16:22:21 -0700 (PDT) Received: by 10.204.124.6 with HTTP; Wed, 11 Jul 2012 16:22:21 -0700 (PDT) In-Reply-To: References: Date: Wed, 11 Jul 2012 16:22:21 -0700 Message-ID: Subject: Re: Mapred job failing with LeaseException From: Suraj Varma To: user@hbase.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org The reason you get LeaseExceptions is that the time between two scanner.next() calls exceeded your hbase.regionserver.lease.period setting which defaults to 60s. Whether it is your "client" or your "map task", if it opens a Scan against HBase, scanner.next() should continue to get invoked within this lease period - else, the client is considered dead and the lease is expired. When this "dead" client comes back and tries to do a scanner.next(), it gets a LeaseException. There are several threads on this ... so - google for "hbase scanner leaseexception" and such. See: http://mail-archives.apache.org/mod_mbox/hbase-user/200903.mbox/%3Cfa03480d= 0903110823l5678e8dem353f345483799c5@mail.gmail.com%3E http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/10225 Are you doing some processing in between two scanner.next() calls that takes over 60s over time? --Suraj On Wed, Jul 11, 2012 at 1:23 AM, =EC=B5=9C=EC=9A=B0=EC=9A=A9 wrote: > Hi, > > I'm running a cluster of few hundred servers with Cloudera's CDH3u4 > HBase+Hadoop. > and having trouble with what I think is a simple map job which uses > HBase table as an input. > My mapper code is org.apache.hadoop.hbase.mapreduce.Export with a few > SingleColumnValueFilter(i.e. a FilterList) added to the Scan object. > The job seems to progress without any trouble at first, but after > about 5~7 minutes when little over 50% of map tasks complete, > I suddenly see a lot of LeaseExceptions and the job ultimately fails. > > Here's the stack print I see on my failed tasks: > > org.apache.hadoop.hbase.regionserver.LeaseException: > org.apache.hadoop.hbase.regionserver.LeaseException: lease > '7595201038414594449' does not exist at > org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) = at > org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.jav= a:1881) > at > sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:25) > at > java.lang.reflect.Method.invoke(Method.java:597) at > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039= ) at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAc= cessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConst= ructorAccessorImpl.java:27) > at > java.lang.reflect.Constructor.newInstance(Constructor.java:513) at > > I kind of had a similar problem when I was scanning a particular > region using ResultScanner in a single-threaded manner with the same > filters mentioned above > but I assumed it wouldn't be a problem in mapred since it's more > resilient to single task errors. > > I tried row caching with Scan.setCaching(), lowered > mapred.tasktracker.map.tasks.maximum property in hopes of reducing the > total loads on region servers, but nothing worked. > > Could this be a filter performance problem preventing region servers > from responding before lease expiration? > Or maybe a long sequence of rows don't match my filter list and the > lease expires before it finally hits the one that does. > > I'm kind of new to Hadoop map-reduce and HBase, so any pointers would > be very much appreciated. > Thanks.