Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DBA0C6636 for ; Sun, 10 Jul 2011 11:51:32 +0000 (UTC) Received: (qmail 62216 invoked by uid 500); 10 Jul 2011 11:51:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 61964 invoked by uid 500); 10 Jul 2011 11:51:30 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 61954 invoked by uid 99); 10 Jul 2011 11:51:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Jul 2011 11:51:29 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Srikanth_Shreenivas@mindtree.com designates 119.226.208.136 as permitted sender) Received: from [119.226.208.136] (HELO transportmtw01.mindtree.com) (119.226.208.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Jul 2011 11:51:22 +0000 Received: from MTW02HBX02.mindtree.com (172.22.192.29) by MTW02EDX02.mindtree.com (172.22.192.80) with Microsoft SMTP Server (TLS) id 14.1.270.1; Sun, 10 Jul 2011 17:20:50 +0530 Received: from MTW02MBX02.mindtree.com ([::1]) by MTW02HBX02.mindtree.com ([::1]) with mapi id 14.01.0270.001; Sun, 10 Jul 2011 17:20:58 +0530 From: "Srikanth P. Shreenivas" To: "user@hbase.apache.org" Subject: RE: HBase Read and Write Issues in Mutlithreaded Environments Thread-Topic: HBase Read and Write Issues in Mutlithreaded Environments Thread-Index: AQHMPlLtDz9TDY+PSUuXKAzJ79shZJTlblRg Date: Sun, 10 Jul 2011 11:50:57 +0000 Message-ID: References: <021AE257773CCB4CB70C124A0C882E400DC9D0CA@MTW02MBX01.mindtree.com> <021AE257773CCB4CB70C124A0C882E400DC9D3C9@MTW02MBX01.mindtree.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.12.12.43] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hi St.Ack, I noticed that one of the region server machines had time running one day i= n future. I corrected the date. I ran into some issues after restarting, I was gettin= g error with respect to .META. and stuff which I did not understand much. = Also, status command in hbase shell was displaying "3 servers, 1 dead" wher= eas I had only 3 region server. So, I cleaned the "/hbase" (to get to real problem) and restarted the hbase= nodes. After starting all the 3 nodes of HBase, I ran the test app again and was o= bserving the log files of all the 3 region servers. I noticed that when test app seemed hung, the web app's thread that was ser= ving the request has gone to sleep at below code. I think it stayed like th= at for around 10 minutes before Tomcat probably interrupted it. Thread-#8 - Thread t@29 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.locateRegionInMeta(HConnectionManager.java:791) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.locateRegion(HConnectionManager.java:589) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.relocateRegion(HConnectionManager.java:564) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.getRegionLocation(HConnectionManager.java:415) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerC= allable.java:57) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.getRegionServerWithRetries(HConnectionManager.java:1002) at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:514) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:13= 3) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95= ) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.prefetchRegionCache(HConnectionManager.java:648) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.locateRegionInMeta(HConnectionManager.java:702) - locked java.lang.Object@75826e08 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.locateRegion(HConnectionManager.java:593) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.relocateRegion(HConnectionManager.java:564) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.getRegionLocation(HConnectionManager.java:415) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerC= allable.java:57) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementa= tion.getRegionServerWithRetries(HConnectionManager.java:1002) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) <.. app specific trace removed ...> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut= or.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j= ava:908) at java.lang.Thread.run(Thread.java:619) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D After 10 minutes, web app log showed: 2011-07-10 16:50:28,804 [Thread-#8] ERROR [persistence.handler.HBaseHandler= ] - Exception occurred in searchData: java.io.IOException: Giving up trying to get region server: thread is inter= rupted. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImp= lementation.getRegionServerWithRetries(HConnectionManager.java:1016) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D I did not see anything happening on region server either, the log had occas= ional entries like these: 2011-07-10 16:43:53,648 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCach= e: LRU Stats: total=3D6.52 MB, free=3D788.08 MB, max=3D794.6 MB, blocks=3D0= , accesses=3D1080, hits=3D0, hitRatio=3D0.00%%, cachingAccesses=3D0, cachin= gHits=3D0, cachingHitsRatio=3D=EF=BF=BD%, evictions=3D0, evicted=3D0, evict= edPerRun=3DNaN 2011-07-10 16:48:53,649 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCach= e: LRU Stats: total=3D6.52 MB, free=3D788.08 MB, max=3D794.6 MB, blocks=3D0= , accesses=3D1080, hits=3D0, hitRatio=3D0.00%%, cachingAccesses=3D0, cachin= gHits=3D0, cachingHitsRatio=3D=EF=BF=BD%, evictions=3D0, evicted=3D0, evict= edPerRun=3DNaN 2011-07-10 16:53:53,648 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCach= e: LRU Stats: total=3D6.52 MB, free=3D788.08 MB, max=3D794.6 MB, blocks=3D0= , accesses=3D1080, hits=3D0, hitRatio=3D0.00%%, cachingAccesses=3D0, cachin= gHits=3D0, cachingHitsRatio=3D=EF=BF=BD%, evictions=3D0, evicted=3D0, evict= edPerRun=3DNaN 2 Regards, Srikanth -----Original Message----- From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack Sent: Saturday, July 09, 2011 9:41 PM To: user@hbase.apache.org Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments You read the requirements section in our docs and you have upped the ulimits, nprocs, etc? http://hbase.apache.org/book/os.html If you know the row, can you deduce the regionserver its talking too? (Below is the client failure -- we need to figure whats up on server-side). Once you've done that, can you check its logs? See if you can figure anything on why the hang? Thanks, St.Ack On Sat, Jul 9, 2011 at 6:14 AM, Srikanth P. Shreenivas wrote: > Hi St.Ack, > > We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm, hadoop-hbas= e-0.90.1+15.18-1.noarch.rpm, hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm). > > I ran a the same test which I was running for the app when it was running= on CDH2. =A0The test app posts a request the web app every 100ms, and the = web app reads a HBase record, performs some logic, and saves an audit trail= by writing another HBase record. > > When our app was running on CDH2, I observed the below issue for every 10= to 15 requests. > With CDH3, this issue is not happening at all. =A0So, seems like situatio= n has improved a lot, and our app seems to be lot more stable. > > However, I am still seeing an issue though. =A0There are many requests (a= round 1%) which are not able to read the record from the HBase, and the get= call is hanging for almost 10 minutes. =A0This is what I see in applicatio= n log: > > 2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR [my.app.HBaseHandle= r] =A0- Exception occurred in searchData: > java.io.IOException: Giving up trying to get region server: thread is int= errupted. > =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HConnectionManager$HConn= ectionImplementation.getRegionServerWithRetries(HConnectionManager.java:101= 6) > =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HTable.get(HTable.java:5= 46) > > =A0 =A0 =A0 =A0<...app specific trace removed...> > > =A0 =A0 =A0 =A0at java.util.concurrent.Executors$RunnableAdapter.call(Exe= cutors.java:441) > =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask$Sync.innerRun(FutureTas= k.java:303) > =A0 =A0 =A0 =A0at java.util.concurrent.FutureTask.run(FutureTask.java:138= ) > =A0 =A0 =A0 =A0at org.gridgain.grid.util.runnable.GridRunnable.run(GridRu= nnable.java:194) > =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(= ThreadPoolExecutor.java:886) > =A0 =A0 =A0 =A0at java.util.concurrent.ThreadPoolExecutor$Worker.run(Thre= adPoolExecutor.java:908) > =A0 =A0 =A0 =A0at java.lang.Thread.run(Thread.java:619) > > > I am running the test on the same record, so all by "get" are for same ro= w id. > > > > It will be of immense help if you can provide some inputs on whether we a= re missing some configuration settings, or is there a way to get around thi= s. > > Thanks, > Srikanth > > > > > > > -----Original Message----- > From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack > Sent: Wednesday, June 29, 2011 7:48 PM > To: user@hbase.apache.org > Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments > > Go to CDH3 if you can. =A0CDH2 is also old. > St.Ack > > On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas > wrote: >> Thanks St. Ack for the inputs. >> >> Will upgrading to CDH3 help or is there a version within CDH2 that you r= ecommend we should upgrade to? >> >> Regards, >> Srikanth >> >> -----Original Message----- >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stac= k >> Sent: Wednesday, June 29, 2011 11:16 AM >> To: user@hbase.apache.org >> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments >> >> Can you upgrade? =A0That release is > 18 months old. =A0A bunch has >> happened in the meantime. >> >> For retries exhausted, check whats going on on the remote regionserver >> that you are trying to write too. =A0Its probably struggling and thats >> why requests are not going through -- or the client missed the fact >> that region moved (all stuff that should be working better in latest >> hbase). >> >> St.Ack >> >> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas >> wrote: >>> Hi, >>> >>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm) clu= ster in distributed mode with Hadoop 0.20.2 (hadoop-0.20-0.20.2+320-1.noarc= h). >>> We are using pretty much default configuration, and only thing we have = customized is that we have allocated 4GB RAM in /etc/hbase-0.20/conf/hbase-= env.sh >>> >>> In our setup, we have a web application that reads a record from HBase = and writes a record as part of each web request. =A0 The application is hos= ted in Apache Tomcat 7 and is a stateless web application providing a REST-= like web service API. >>> >>> We are observing that our reads and writes times out once in a =A0while= . =A0This happens more for writes. >>> We see below exception in our application logs: >>> >>> >>> Exception Type 1 - During Get: >>> --------------------------------------- >>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to con= tact region server 10.1.68.36:60020 for region employeedata,be8784ac8b57c45= 625a03d52be981b88097c2fdc,1308657957879, row 'd51b74eb05e07f96cee0ec556f5d8= d161e3281f3', but failed after 10 attempts. >>> Exceptions: >>> java.io.IOException: Call to /10.1.68.36:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> java.nio.channels.ClosedByInterruptException >>> >>> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HConnectionManager$Tab= leServers.getRegionServerWithRetries(HConnectionManager.java:1048) >>> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HTable.get(HTable.java= :417) >>> =A0 =A0 >>> >>> Exception =A0Type 2 - During Put: >>> --------------------------------------------- >>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Tr= ying to contact region server 10.1.68.34:60020 for region audittable,,13091= 83872019, row '2a012017120f80a801b28f5f66a83dc2a8882d1b', but failed after = 10 attempts. >>> Exceptions: >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exceptio= n: java.nio.channels.ClosedByInterruptException >>> >>> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HConnectionManager$Tab= leServers.getRegionServerWithRetries(HConnectionManager.java:1048) >>> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HConnectionManager$Tab= leServers$3.doCall(HConnectionManager.java:1239) >>> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HConnectionManager$Tab= leServers$Batch.process(HConnectionManager.java:1161) >>> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HConnectionManager$Tab= leServers.processBatchOfRows(HConnectionManager.java:1247) >>> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HTable.flushCommits(HT= able.java:609) >>> =A0 =A0 =A0 =A0at org.apache.hadoop.hbase.client.HTable.put(HTable.java= :474) >>> =A0 =A0 >>> >>> Any inputs on why this is happening, or how to rectify it will be of im= mense help. >>> >>> Thanks, >>> Srikanth >>> >>> >>> >>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global Villa= ge, RVCE Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80 26264000 = / Fax +91 80 2626 4100| Mob: 9880141059|email: srikanth_shreenivas@mindtree= .com |www.mindtree.com | >>> >>> >>> ________________________________ >>> >>> http://www.mindtree.com/email/disclaimer.html >>> >> >