Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B361BD82 for ; Tue, 10 Jan 2012 16:05:25 +0000 (UTC) Received: (qmail 43891 invoked by uid 500); 10 Jan 2012 16:05:23 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 43835 invoked by uid 500); 10 Jan 2012 16:05:22 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 43825 invoked by uid 99); 10 Jan 2012 16:05:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2012 16:05:22 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates 209.85.161.169 as permitted sender) Received: from [209.85.161.169] (HELO mail-gx0-f169.google.com) (209.85.161.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Jan 2012 16:05:14 +0000 Received: by ggnp4 with SMTP id p4so390005ggn.14 for ; Tue, 10 Jan 2012 08:04:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:references:from:content-type:x-mailer:in-reply-to :message-id:date:to:content-transfer-encoding:mime-version; bh=M4JoZwGsHpfDzkNG7QnPm6Q6jXcBX/zsP/HsSiUfxAQ=; b=A2/T4w+hLWiuWnuaqEKM6vJzArAKUzncn0Uf0/Sk27ZhMVihI+iM9Z1t/32GAUZt9D 64uUyHIHWc3rHeJY6kRLJ457hy6aTRvL1/edaZDGLy3CPg6xo++2dymUKz6R2whVRwtn ID8Z6QoWbYs8uT7QZr/OVYGh0fevKvzc7DPmE= Received: by 10.50.180.138 with SMTP id do10mr2852092igc.20.1326211493054; Tue, 10 Jan 2012 08:04:53 -0800 (PST) Received: from [192.168.0.15] (c-24-130-233-55.hsd1.ca.comcast.net. [24.130.233.55]) by mx.google.com with ESMTPS id q30sm48472702ibc.1.2012.01.10.08.04.50 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 Jan 2012 08:04:51 -0800 (PST) Subject: Re: NPE while obtaining row lock References: <47BB1327-B73F-44E9-8472-888FCF8F3333@langisch.ch> From: yuzhihong@gmail.com Content-Type: text/plain; charset=us-ascii X-Mailer: iPhone Mail (9A405) In-Reply-To: <47BB1327-B73F-44E9-8472-888FCF8F3333@langisch.ch> Message-Id: <4D4979B8-F27F-4E50-9471-32E99649AF76@gmail.com> Date: Tue, 10 Jan 2012 08:04:46 -0800 To: "user@hbase.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the analysis.=20 Do you mind opening a Jira ? On Jan 10, 2012, at 7:51 AM, Yves Langisch wrote: > Still happens with HBase 0.90.5/Hadoop 1.0.0. But I think I have some more= insights on this topic. Following an up to date stack trace: >=20 > java.lang.NullPointerException > at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowa= bleToIOE(HRegionServer.java:986) > at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegi= onServer.java:2008) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod= AccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:5= 70) > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.= java:1039) > Caused by: java.lang.NullPointerException > at java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.jav= a:881) > at org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HR= egionServer.java:2018) > at org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegi= onServer.java:2004) > ... 5 more >=20 > After checking the source code I've noticed that the value which is going t= o be put into the HashMap can be null in the case where the waitForLock flag= is true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLoc= k, line 2111ff). The latter I think happens in our case as we have heavy loa= d hitting the server. >=20 > IMHO this case should be handled somehow and must not lead to a NPE. >=20 > - > Yves >=20 > On Dec 30, 2011, at 12:12 PM, Yves Langisch wrote: >=20 >> Still happens but before I'm going to add some debugging information I'll= try to deploy the new version 0.90.5. >>=20 >> - >> Yves >>=20 >> On Dec 18, 2011, at 12:08 AM, Stack wrote: >>=20 >>> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch wrote:= >>>> I'm using the async hbase client (1.0) and there is no way to choose a l= ockId on my own: >>>>=20 >>>> >>>> return database.client().lockRow( >>>> new RowLockRequest(TableManager.ID_TABLE_NAME, MAXID_R= OW)).join(); >>>>=20 >>>> >>>>=20 >>>> Any ideas what else could be wrong here? >>>>=20 >>>=20 >>> Looking at the code on regionserver side, >>> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/= hadoop/hbase/regionserver/HRegionServer.java?view=3Dmarkup, >>> down around line 1994, its unlikely the region is null since we should >>> throw NotServingRegionException if can't find region (and we check for >>> null region name a few lines up) so maybe its something in the way we >>> do obtainRowLock on line 1995? >>>=20 >>> Any chance of your instrumenting the regionserver? Adding a bit of >>> debugging and deploying the debugging regionserver? >>>=20 >>> My guess is we haven't seen this before because not many use rowlocks >>> (rowlocks if long-lived and lots of contending clients could freeze >>> you out of the server; each client blocked waiting on rowlock to clear >>> occupies a handler of which there are a bounded number). >>>=20 >>> St.Ack >>>=20 >>=20 >=20