Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 82718E613 for ; Thu, 6 Dec 2012 04:27:20 +0000 (UTC) Received: (qmail 53315 invoked by uid 500); 6 Dec 2012 04:27:18 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 52842 invoked by uid 500); 6 Dec 2012 04:27:18 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 52795 invoked by uid 99); 6 Dec 2012 04:27:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 04:27:16 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anoopsj@huawei.com designates 119.145.14.64 as permitted sender) Received: from [119.145.14.64] (HELO szxga01-in.huawei.com) (119.145.14.64) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 04:27:11 +0000 Received: from 172.24.2.119 (EHLO szxeml214-edg.china.huawei.com) ([172.24.2.119]) by szxrg01-dlp.huawei.com (MOS 4.3.4-GA FastPath queued) with ESMTP id AUA40676; Thu, 06 Dec 2012 12:26:49 +0800 (CST) Received: from SZXEML439-HUB.china.huawei.com (10.72.61.74) by szxeml214-edg.china.huawei.com (172.24.2.29) with Microsoft SMTP Server (TLS) id 14.1.323.3; Thu, 6 Dec 2012 12:25:52 +0800 Received: from SZXEML553-MBX.china.huawei.com ([169.254.3.232]) by szxeml439-hub.china.huawei.com ([10.72.61.74]) with mapi id 14.01.0323.003; Thu, 6 Dec 2012 12:25:48 +0800 From: Anoop Sam John To: "user@hbase.apache.org" , lars hofhansl Subject: RE: .META. region server DDOSed by too many clients Thread-Topic: .META. region server DDOSed by too many clients Thread-Index: AQHN00N4m64lGkp+yE6PJiDMgMEvIJgKYkyAgAAE34CAAADcAIAAxK42 Date: Thu, 6 Dec 2012 04:25:48 +0000 Message-ID: <0CE69E9126D0344088798A3B7F7F80863AEA1133@SZXEML553-MBX.china.huawei.com> References: <1354753217.4156.YahooMailNeo@web140602.mail.bf1.yahoo.com> , In-Reply-To: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.18.96.95] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Virus-Checked: Checked by ClamAV on apache.org >is the META table cached just like other tables=20 Yes Varun I think so.=20 -Anoop- ________________________________________ From: Varun Sharma [varun@pinterest.com] Sent: Thursday, December 06, 2012 6:10 AM To: user@hbase.apache.org; lars hofhansl Subject: Re: .META. region server DDOSed by too many clients We only see this on the .META. region not otherwise... On Wed, Dec 5, 2012 at 4:37 PM, Varun Sharma wrote: > I see but is this pointing to the fact that we are heading to disk for > scanning META - if yes, that would be pretty bad, no ? Currently I am > trying to see if the freeze coincides with Block Cache being full (we hav= e > an inmemory column) - is the META table cached just like other tables ? > > Varun > > > On Wed, Dec 5, 2012 at 4:20 PM, lars hofhansl wrote= : > >> Looks like you're running into HBASE-5898. >> >> >> >> ----- Original Message ----- >> From: Varun Sharma >> To: user@hbase.apache.org >> Cc: >> Sent: Wednesday, December 5, 2012 3:51 PM >> Subject: .META. region server DDOSed by too many clients >> >> Hi, >> >> I am running hbase 0.94.0 and I have a significant write load being put = on >> a table with 98 regions on a 15 node cluster - also this write load come= s >> from a very large number of clients (~ 1000). I am running with 10 >> priority >> IPC handlers and 200 IPC handlers. It seems the region server holding >> .META >> is DDOSed. All the 200 handlers are busy serving the .META. region and >> they >> are all locked onto on object. The Jstack is here for the regoin server >> >> "IPC Server handler 182 on 60020" daemon prio=3D10 tid=3D0x00007f329872c= 800 >> nid=3D0x4401 waiting on condition [0x00007f328807f000] >> java.lang.Thread.State: WAITING (parking) >> at sun.misc.Unsafe.park(Native Method) >> - parking to wait for <0x0000000542d72e30> (a >> java.util.concurrent.locks.ReentrantLock$NonfairSync) >> at >> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) >> at >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterr= upt(AbstractQueuedSynchronizer.java:838) >> at >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Abst= ractQueuedSynchronizer.java:871) >> at >> >> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQu= euedSynchronizer.java:1201) >> at >> >> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.= java:214) >> at >> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) >> at >> >> java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.jav= a:445) >> at >> >> java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.jav= a:925) >> at >> org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:71) >> at >> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.j= ava:290) >> at >> >> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekTo= DataBlock(HFileBlockIndex.java:213) >> at >> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(= HFileReaderV2.java:455) >> at >> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekT= o(HFileReaderV2.java:493) >> at >> >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(St= oreFileScanner.java:242) >> at >> >> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileSc= anner.java:167) >> at >> >> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(N= onLazyKeyValueScanner.java:54) >> at >> >> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyVal= ueHeap.java:299) >> at >> >> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.ja= va:244) >> at >> >> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.ja= va:521) >> - locked <0x000000063b4965d0> (a >> org.apache.hadoop.hbase.regionserver.StoreScanner) >> at >> >> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java= :402) >> - locked <0x000000063b4965d0> (a >> org.apache.hadoop.hbase.regionserver.StoreScanner) >> at >> >> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java= :127) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInter= nal(HRegion.java:3354) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg= ion.java:3310) >> - locked <0x0000000523c211e0> (a >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg= ion.java:3327) >> - locked <0x0000000523c211e0> (a >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4066) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4039) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.jav= a:1941) >> >> The client side trace shows that we are looking for META region. >> >> thrift-worker-3499" daemon prio=3D10 tid=3D0x00007f789dd98800 nid=3D0xb5= 2 >> waiting >> for monitor entry [0x00007f778672d000] >> java.lang.Thread.State: BLOCKED (on object monitor) >> at >> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.locateRegionInMeta(HConnectionManager.java:943) >> - waiting to lock <0x0000000707978298> (a java.lang.Object) >> at >> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.locateRegion(HConnectionManager.java:836) >> at >> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.processBatchCallback(HConnectionManager.java:1482) >> at >> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementat= ion.processBatch(HConnectionManager.java:1367) >> at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:729) >> - locked <0x000000070821d5a0> (a >> org.apache.hadoop.hbase.client.HTable) >> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:698) >> at >> >> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.ja= va:371) >> >> On the RS page, I see 68 million read requests for the META region while >> for the other 98 regions - we have done like 20 million write requests i= n >> total - regions have not moved around at all and no crashes have happene= d. >> Why do we have such an incredible number of scans over META and is there >> something I can do about this issue ? >> >> Varun >> >> >=