Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36F28DE33 for ; Thu, 14 Mar 2013 17:11:32 +0000 (UTC) Received: (qmail 57699 invoked by uid 500); 14 Mar 2013 17:11:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 57656 invoked by uid 500); 14 Mar 2013 17:11:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 57648 invoked by uid 99); 14 Mar 2013 17:11:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Mar 2013 17:11:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of deepak8.kumar@citi.com designates 67.231.153.94 as permitted sender) Received: from [67.231.153.94] (HELO mx0b-00123c01.pphosted.com) (67.231.153.94) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Mar 2013 17:11:22 +0000 Received: from pps.filterd (m0008248 [127.0.0.1]) by mx0b-00123c02.pphosted.com (8.14.5/8.14.5) with SMTP id r2EGQKno004240; Thu, 14 Mar 2013 17:09:54 GMT Received: from mail.citigroup.com ([192.193.158.10]) by mx0b-00123c02.pphosted.com with ESMTP id 1ayt3g74e1-1 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Thu, 14 Mar 2013 17:09:54 +0000 Received: from imbhub-sw12.nam.nsroot.net (namdlpdimpsw06.nam.nsroot.net [153.40.172.112]) by smtpinbound.citigroup.com (Switch-3.4.1/Switch-3.4.1) with ESMTP id r2EH9aRk022054; Thu, 14 Mar 2013 17:09:53 GMT Received: from exnjiht03.nam.nsroot.net (EXNJIHT03.nam.nsroot.net [150.110.165.229]) by imbhub-sw12.nam.nsroot.net (Switch-3.4.1/Switch-3.4.1) with ESMTP id r2EH54kn011057; Thu, 14 Mar 2013 17:09:50 GMT Received: from EXNJHT10.nam.nsroot.net (150.110.191.51) by exnjiht03.nam.nsroot.net (150.110.165.229) with Microsoft SMTP Server (TLS) id 8.3.264.0; Thu, 14 Mar 2013 13:09:42 -0400 Received: from EXTXHT10.nam.nsroot.net (169.177.87.52) by EXNJHT10.nam.nsroot.net (150.110.191.51) with Microsoft SMTP Server (TLS) id 14.2.309.2; Thu, 14 Mar 2013 13:09:42 -0400 Received: from EXTXMB19.nam.nsroot.net ([169.254.4.171]) by EXTXHT10.nam.nsroot.net ([169.177.87.52]) with mapi id 14.02.0309.002; Thu, 14 Mar 2013 12:09:41 -0500 From: "Kumar, Deepak8 " To: "'user@hbase.apache.org'" , "'Gary Helmling'" , "'yuzhihong@gmail.com'" , "'lars hofhansl'" Subject: RE: Regionserver goes down while endpoint execution Thread-Topic: Regionserver goes down while endpoint execution Thread-Index: AQHOIAUSCSLIF8bWT/aKkRO2FdtnvJilalsg Date: Thu, 14 Mar 2013 17:09:41 +0000 Message-ID: <1FA690FF1F80B649BA7B8DFACF3070062330B7B3@EXTXMB19.nam.nsroot.net> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [169.177.87.242] Content-Type: multipart/alternative; boundary="_000_1FA690FF1F80B649BA7B8DFACF3070062330B7B3EXTXMB19namnsro_" MIME-Version: 1.0 X-WiganSS: 01000000010017EXNJHT10.nam.nsroot.net ID0042<1FA690FF1F80B649BA7B8DFACF3070062330B7B3@EXTXMB19.nam.nsroot.net> X-CFilter-Loop: Reflected X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.10.8626,1.0.431,0.0.0000 definitions=2013-03-14_06:2013-03-14,2013-03-14,1970-01-01 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_1FA690FF1F80B649BA7B8DFACF3070062330B7B3EXTXMB19namnsro_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, It seems due to huge data the RegionServer is getting down. Now I am trying= to fetch the data in parts & is running fine. I need some more info about = the Endpoint execution: My use case is to fetch the data from HBase as per some rowkey range & to r= ender it at UI. Since endpoints are executed in parallel so I am looking to= use it. =D8 Suppose I provide the rowkey range as rowkey1 to rowkey100 in endpoint= RPC client & these rowkeys are distributed at 5 regions across 4 region se= rvers. If I fetch 10 records at a time, do we have any way to guarantee th= at it would come in serial order like first result would of rowkey1 to rowk= ey10, next time I set the start rowkey as rowkey11 & the fetch would be fro= m rowkey11 to rowkey20, irrespective of the region & region servers? Regards, Deepak -----Original Message----- From: hv.csuoa@gmail.com [mailto:hv.csuoa@gmail.com] On Behalf Of Himanshu = Vashishtha Sent: Wednesday, March 13, 2013 12:09 PM To: user@hbase.apache.org Cc: Gary Helmling; yuzhihong@gmail.com; lars hofhansl Subject: Re: Regionserver goes down while endpoint execution On Wed, Mar 13, 2013 at 8:19 AM, Kumar, Deepak8 > wrote: > Thanks guys for assisting. I am getting OOM exception yet. I have one que= ry about Endpoints. As endpoint executes in parallel, so if I have a table = which is distributed at 101 regions across 5 regionserver. Would it be 101 = threads of endpoint executing in parallel? No and Yes. The endpoints are not processed as separate threads, they are processed as = just another request (via regionserver handlers). Yes, the execution will b= e in parallel in the sense that a separate client side call will be used fo= r each of the regions that are in the range you specify. > > Regards, > Deepak > > From: Gary Helmling [mailto:ghelmling@gmail.com] > Sent: Tuesday, March 12, 2013 2:14 PM > To: user@hbase.apache.org > Cc: lars hofhansl; Kumar, Deepak8 [CCC-OT_IT NE] > Subject: Re: Regionserver goes down while endpoint execution > > To expand on what Himanshu said, your endpoint is doing an unbounded scan= on the region, so with a region with a lot of rows it's taking more than 6= 0 seconds to run to the region end, which is why the client side of the cal= l is timing out. In addition you're building up an in memory list of all t= he values for that qualifier in that region, which could cause you to bump = into OOM issues, depending on how big your values are and how sparse the gi= ven column qualifier is. If you trigger an OOMException, then the region s= erver would abort. > > For this usage specifically, though -- scanning through a single column q= ualifier for all rows -- you would be better off just doing a normal client= side scan, ie. HTable.getScanner(). Then you will avoid the client timeou= t and potential server-side memory issues. > > On Tue, Mar 12, 2013 at 9:29 AM, Ted Yu >> w= rote: > From region server log: > > 2013-03-12 03:07:22,605 DEBUG org.apache.hadoop.hdfs.DFSClient: Error > making BlockReader. Closing stale > Socket[addr=3D/10.42.105.112,port=3D50010,localport= =3D > 54114] > java.io.EOFException: Premature EOF: no length prefix available > at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(Hdf= sProtoUtil.java:162) > at > org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockRe > ader2.java:407) > > What version of HBase and hadoop are you using ? > Do versions of hadoop on Eclipse machine and in your cluster match ? > > Cheers > > On Tue, Mar 12, 2013 at 4:46 AM, Kumar, Deepak8 >>wrote: >> Lars,**** >> >> I am getting following errors at datanode & region servers.**** >> >> ** ** >> >> Regards,**** >> >> Deepak**** >> >> ** ** >> >> *From:* Kumar, Deepak8 [CCC-OT_IT NE] >> *Sent:* Tuesday, March 12, 2013 3:00 AM >> *To:* Kumar, Deepak8 [CCC-OT_IT NE]; >> 'user@hbase.apache.org'; 'lars hofhansl' >> >> *Subject:* RE: Regionserver goes down while endpoint execution**** >> >> ** ** >> >> Lars,**** >> >> It is having following errors when I execute the Endpoint RPC client >> from eclipse. It seems some of the regions at regionserver >> vm-8aa9-fe74.nam.nsroot.net is >> taking more time to reponse.**** >> >> ** ** >> >> Could you guide how to fix it. I don't find any option to set >> hbase.rpc.timeout from hbase configuration menu in CDH4 CM server for >> hbase configuration.** >> ** >> >> ** ** >> >> Regards,**** >> >> Deepak**** >> >> ** ** >> >> 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment >> complete on server >> vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181> .nsroot.net/10.96.172.44:2181>, sessionid =3D 0x53d591b77090026, >> negotiated timeout =3D 60000**** >> >> Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration >> warnOnceIfDeprecated**** >> >> WARNING: hadoop.native.lib is deprecated. Instead, use >> io.native.lib.available**** >> >> Mar 12, 2013 2:44:00 AM >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen >> tation >> processExecs**** >> >> WARNING: Error executing for row 153299:1362780381523:2932572079500658: >> vm-ab1f-dd21.nam.nsroot.net:**** >> >> *java.util.concurrent.ExecutionException*: * >> org.apache.hadoop.hbase.client.RetriesExhaustedException*: Failed >> after attempts=3D10, exceptions:**** >> >> Tue Mar 12 02:34:15 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2271remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:35:16 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2403remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:36:18 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2465remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:37:20 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2500remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:38:22 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2538remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:39:25 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2572remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:40:30 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2606remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:41:34 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2640remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:42:43 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2677remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:44:00 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2842remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> ** ** >> >> at java.util.concurrent.FutureTask$Sync.innerGet(Unknown >> Source)**** >> >> at java.util.concurrent.FutureTask.get(Unknown Source)**** >> >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen >> tation.processExecs( >> *HConnectionManager.java:1466*)**** >> >> at org.apache.hadoop.hbase.client.HTable.coprocessorExec(* >> HTable.java:1577*)**** >> >> at org.apache.hadoop.hbase.client.HTable.coprocessorExec(* >> HTable.java:1557*)**** >> >> at >> com.citi.sponge.hbase.endpoint.HBaseEndPointClientForElfLog.main( >> *HBaseEndPointClientForElfLog.java:33*)**** >> >> Caused by: *org.apache.hadoop.hbase.client.RetriesExhaustedException*: >> Failed after attempts=3D10, exceptions:**** >> >> Tue Mar 12 02:34:15 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2271remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:35:16 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2403remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:36:18 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2465remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:37:20 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2500remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:38:22 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2538remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:39:25 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2572remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:40:30 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2606remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:41:34 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2640remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:42:43 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2677remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:44:00 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2842remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> ** ** >> >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen >> tation.getRegionServerWithRetries( >> *HConnectionManager.java:1345*)**** >> >> at org.apache.hadoop.hbase.ipc.ExecRPCInvoker.invoke(* >> ExecRPCInvoker.java:79*)**** >> >> at $Proxy8.getValues(Unknown Source)**** >> >> at >> com.citi.sponge.hbase.endpoint.HBaseEndPointClientForElfLog$1.call(* >> HBaseEndPointClientForElfLog.java:38*)**** >> >> at >> com.citi.sponge.hbase.endpoint.HBaseEndPointClientForElfLog$1.call(* >> HBaseEndPointClientForElfLog.java:1*)**** >> >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen >> tation$4.call( >> *HConnectionManager.java:1454*)**** >> >> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown >> Source)**** >> >> at java.util.concurrent.FutureTask.run(Unknown Source)**** >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown >> Source)**** >> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown >> Source) >> **** >> >> at java.lang.Thread.run(Unknown Source)**** >> >> *org.apache.hadoop.hbase.client.RetriesExhaustedException*: Failed >> after attempts=3D10, exceptions:**** >> >> Tue Mar 12 02:34:15 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2271remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:35:16 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2403remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:36:18 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2465remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:37:20 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2500remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:38:22 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2538remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:39:25 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2572remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:40:30 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2606remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:41:34 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2640remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:42:43 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2677remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> Tue Mar 12 02:44:00 EDT 2013, >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f >> .hadoop.hbase.ipc.ExecRPCInvoker$1@39443f>, * >> java.net.SocketTimeoutException*: Call to >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020> m.nsroot.net/10.42.105.91:60020> failed on socket timeout >> exception: *java.net.SocketTimeoutException*: 60000 millis timeout >> while waiting for channel to be ready for read. ch : >> java.nio.channels.SocketChannel[connected >> local=3D/150.110.96.212:2842remote=3D >> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]****> 74.nam.nsroot.net/10.42.105.91:60020%5d****> >> >> ** ** >> >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen >> tation.getRegionServerWithRetries( >> *HConnectionManager.java:1345*)**** >> >> at org.apache.hadoop.hbase.ipc.ExecRPCInvoker.invoke(* >> ExecRPCInvoker.java:79*)**** >> >> at $Proxy8.getValues(Unknown Source)**** >> >> at >> com.citi.sponge.hbase.endpoint.HBaseEndPointClientForElfLog$1.call(* >> HBaseEndPointClientForElfLog.java:38*)**** >> >> at >> com.citi.sponge.hbase.endpoint.HBaseEndPointClientForElfLog$1.call(* >> HBaseEndPointClientForElfLog.java:1*)**** >> >> at >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen >> tation$4.call( >> *HConnectionManager.java:1454*)**** >> >> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown >> Source)**** >> >> at java.util.concurrent.FutureTask.run(Unknown Source)**** >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown >> Source)**** >> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown >> Source) >> **** >> >> at java.lang.Thread.run(Unknown Source)**** >> >> ** ** >> >> ** ** >> >> *From:* Kumar, Deepak8 [CCC-OT_IT NE] >> *Sent:* Tuesday, March 12, 2013 2:27 AM >> *To:* 'user@hbase.apache.org'; 'lars hofha= nsl' >> *Subject:* RE: Regionserver goes down while endpoint execution**** >> >> ** ** >> >> Lars,**** >> >> Thanks for your quick response.There is not much info in region >> server log. I am again executing it with DEBUG log level in region >> servers.**** >> >> ** ** >> >> *Here is the endpoint code* >> >> ** ** >> >> public class ColumnAggregationEndpoint extends >> BaseEndpointCoprocessor**** >> >> implements ColumnAggregationProtocol {**** >> >> **** >> >> @Override**** >> >> public List getValues(byte[] family, byte[] >> qualifier, int batchSize, int cacheSize)**** >> >> throws IOException {**** >> >> // aggregate at each region**** >> >> Scan scan =3D new Scan();**** >> >> scan.addColumn(family, qualifier);**** >> >> scan.setCaching(cacheSize);**** >> >> scan.setBatch(batchSize);**** >> >> List values =3D new ArrayList();**** >> >> RegionCoprocessorEnvironment environment =3D**** >> >> (RegionCoprocessorEnvironment) >> getEnvironment();**** >> >> **** >> >> InternalScanner scanner =3D >> environment.getRegion().getScanner(scan);**** >> >> try {**** >> >> List curVals =3D new ArrayList();**** >> >> boolean hasMore =3D false;**** >> >> do {**** >> >> curVals.clear();**** >> >> hasMore =3D scanner.next(curVals);**** >> >> KeyValue kv =3D curVals.get(0);**** >> >> values.add(Bytes.toString(kv.getValue()));**** >> >> } while (hasMore);**** >> >> } finally {**** >> >> scanner.close();**** >> >> }**** >> >> return values;**** >> >> }**** >> >> }**** >> >> ** ** >> >> ** ** >> >> ** ** >> >> *The RPC client to invoke the Endpoint is as follows:* >> >> ** ** >> >> public class HBaseEndPointClientForElfLog {**** >> >> public static void main(String[] args) {**** >> >> try {**** >> >> Configuration conf =3D >> HBaseConfiguration.create();**** >> >> conf.set(**** >> >> "hbase.zookeeper.quorum",**** >> >> >> "vm-ab1f-dd21.nam.nsroot.net, >> vm-cb03-2277.nam.nsroot.net,vm-15 >> c2-3bbf.nam.nsroot.net");**** >> >> String tableName =3D "elf_log";**** >> >> final String columnFamily =3D "content";**** >> >> final String columnQualifier =3D "logFileName";**** >> >> final String startRowKey =3D >> "153299:1362780381523:2932572079500658:vm-ab1f-dd21.nam.nsroot.net> tp://vm-ab1f-dd21.nam.nsroot.net>:";**** >> >> final String endRowKey =3D >> "153299:1362953388000";**** >> >> HTableInterface table =3D new HTable(conf, >> tableName);**** >> >> Scan scan;**** >> >> Map> results;**** >> >> **** >> >> // scan: for all regions**** >> >> scan =3D new Scan();**** >> >> **** >> >> results =3D >> table.coprocessorExec(ColumnAggregationProtocol.class,**** >> >> startRowKey.getBytes(), >> endRowKey.getBytes(), >> **** >> >> new >> Batch.Call> List>() {**** >> >> public List >> call(ColumnAggregationProtocol instance)**** >> >> throws IOException >> {**** >> >> return >> instance.getValues(columnFamily.getBytes(),**** >> >> >> columnQualifier.getBytes(),2,5);**** >> >> }**** >> >> });**** >> >> **** >> >> for (Map.Entry> e : >> results.entrySet()) {**** >> >> System.out.println("Size of list returned: >> "+e.getValue().size());**** >> >> for(String singleVal: e.getValue()){**** >> >> System.out.println(singleVal);**** >> >> }**** >> >> **** >> >> **** >> >> }**** >> >> } catch (Throwable throwable) {**** >> >> throwable.printStackTrace();**** >> >> }**** >> >> }**** >> >> }**** >> >> ** ** >> >> Regards,**** >> >> Deepak**** >> >> ** ** >> >> -----Original Message----- >> From: lars hofhansl [mailto:larsh@apache.org >> >>] >> Sent: Tuesday, March 12, 2013 2:01 AM >> To: user@hbase.apache.org> >> Subject: Re: Regionserver goes down while endpoint execution**** >> >> ** ** >> >> What does the region server log say?**** >> >> ** ** >> >> ** ** >> >> Endpoints do not run in a sandbox. You can call System.exit(...) and >> your RegionServer will happily exit.**** >> >> If you can, please show us your endpoint code.**** >> >> ** ** >> >> -- Lars**** >> >> ** ** >> >> ** ** >> >> ** ** >> >> ________________________________**** >> >> From: "Kumar, Deepak8 " >> >>**** >> >> To: "'user@hbase.apache.org'" >> >> **** >> >> Sent: Monday, March 11, 2013 10:51 PM**** >> >> Subject: Regionserver goes down while endpoint execution**** >> >> ** ** >> >> Hi,**** >> >> I have a table in hbase which has more than 5GB of data, it is >> distributed at 101 regions at 5 regionservers.**** >> >> ** ** >> >> When I execute an endpoint which is supposed to fetch a column >> qualifier value using an endpoint RPC client, the region server goes >> down. The hbase master log says "Can't connect to region, retrying.." >> The same endpoint works fine for tables which has 300 records.**** >> >> ** ** >> >> Could you please guide me the reason for being regionserver down?**** >> >> ** ** >> >> Regards,**** >> >> Deepak**** >> > --_000_1FA690FF1F80B649BA7B8DFACF3070062330B7B3EXTXMB19namnsro_--