Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AFA4210DAB for ; Wed, 28 Aug 2013 15:01:54 +0000 (UTC) Received: (qmail 1611 invoked by uid 500); 28 Aug 2013 15:01:51 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 1089 invoked by uid 500); 28 Aug 2013 15:01:46 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 1073 invoked by uid 99); 28 Aug 2013 15:01:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Aug 2013 15:01:44 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ameya@groupon.com designates 74.125.245.90 as permitted sender) Received: from [74.125.245.90] (HELO na3sys010aog111.obsmtp.com) (74.125.245.90) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 28 Aug 2013 15:01:38 +0000 Received: from mail-vc0-f182.google.com ([209.85.220.182]) (using TLSv1) by na3sys010aob111.postini.com ([74.125.244.12]) with SMTP ID DSNKUh4QvD4u4ucxFB5zFa5N4d1nX0/ISc+P@postini.com; Wed, 28 Aug 2013 08:01:17 PDT Received: by mail-vc0-f182.google.com with SMTP id hf12so4307188vcb.27 for ; Wed, 28 Aug 2013 08:01:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=groupon.com; s=google; h=mime-version:from:date:message-id:subject:to:content-type; bh=anTz9BWRe2IKVpEQuKnAuGFBrJ9RHp9N97NlBh8y2eM=; b=KpYDZ0Y9AR852qxjHAGnHCwNuAB1L71vk9nAKROYWVcMhcuGIgf1wzjr1zAQKvhevH FdueyZLclS0A30xOcO/oxNjqL9a4wSm4EyXjCs1S89mO5VddsnFSTNwsEwqU+bps10O+ bB74jX+A0gXnIpRXW7fAj9Kpbpntxh/oAyma0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=anTz9BWRe2IKVpEQuKnAuGFBrJ9RHp9N97NlBh8y2eM=; b=VNX4SpHZMADW1R8foHe1xIJWraqRTQY1O+zILbqjEx15eHOzej4IRR3wG01ydksuYK 9crApuLy/o65LgjVZiIbI5lO2FraBIdDE64HfQQbd4z/YEXV4VtwUzVUYYF3geZAqr/1 uKpwxETLIsxUSz2PeAiLdyZrkIl4LahaS9Of+oCKAsdEiTBYvZhCKZDT0eDkFA0HlNCa 5rC+7rWqTtzODNdEaBMfimxQdDOC0v91vZXNPxzQwVqX4Pagr4clADuG3DStoFthLtTY AXWQtokLQ4z9g63HWHD9opIhuDfhYP1E7jYvqW40UBk5oC46+Khzdzn9KiWcB0H87IV6 7QQA== X-Gm-Message-State: ALoCoQl7W5P5+LoEh2dgeX+u186PZ+b0HixgfHAJ7LTYK2xXnn9iCU/lnv0uv20ROaIRiLjhpcK3l44nP7iohpCyVN1a05wCsqxeSt+T7Bksewk1e9RXk/ec8aPK3r8lXtBAVa86U56JdF44rXfKOF7ZPBJyt50CO7qpsT3Cny+MYVV5x8jbwKQ= X-Received: by 10.58.235.69 with SMTP id uk5mr26188335vec.17.1377702075580; Wed, 28 Aug 2013 08:01:15 -0700 (PDT) X-Received: by 10.58.235.69 with SMTP id uk5mr26188310vec.17.1377702075317; Wed, 28 Aug 2013 08:01:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.155.226 with HTTP; Wed, 28 Aug 2013 08:00:55 -0700 (PDT) From: Ameya Kanitkar Date: Wed, 28 Aug 2013 08:00:55 -0700 Message-ID: Subject: Lease Exception Errors When Running Heavy Map Reduce Job To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7bd6c2c4ce509904e503421d X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd6c2c4ce509904e503421d Content-Type: text/plain; charset=ISO-8859-1 HI All, We have a very heavy map reduce job that goes over entire table with over 1TB+ data in HBase and exports all data (Similar to Export job but with some additional custom code built in) to HDFS. However this job is not very stable, and often times we get following error and job fails: org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '-4456594242606811626' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429) at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor. Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb We have changed following settings in HBase to counter this problem but issue persists: hbase.regionserver.lease.period 900000 hbase.rpc.timeout 900000 We also reduced number of mappers per RS less than available CPU's on the box. We also observed that problem once happens, happens multiple times on the same RS. All other regions are unaffected. But different RS observes this problem on different days. There is no particular region causing this either. We are running: 0.94.2 with cdh4.2.0 Any ideas? Ameya --047d7bd6c2c4ce509904e503421d--