Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7FD8AD70B for ; Thu, 6 Dec 2012 00:38:14 +0000 (UTC) Received: (qmail 90855 invoked by uid 500); 6 Dec 2012 00:38:12 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 90738 invoked by uid 500); 6 Dec 2012 00:38:12 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 90729 invoked by uid 99); 6 Dec 2012 00:38:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 00:38:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of varun@pinterest.com designates 209.85.223.172 as permitted sender) Received: from [209.85.223.172] (HELO mail-ie0-f172.google.com) (209.85.223.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 00:38:04 +0000 Received: by mail-ie0-f172.google.com with SMTP id c13so11166517ieb.17 for ; Wed, 05 Dec 2012 16:37:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=6rEcm8mVnLa0mfd0CTsxC/Cthm6EE39YLWVlZr0WU6w=; b=DgrCz3Ey4RAKmcbK5BP0SiRuly+h1T5Kzl4WjpCzH2Dy5ryZ+XX00Wg7Zg2juii/af MbFu6I8TTqMj0rYf9qSAW8BVyKefdenfYwZHho/gScLXha0NLO+5i0sSdHHnzrO7apl5 Q65+Hyb8VSsbbAShMIkhWuIHt4stQyVZTQFUPpcX5ppDBP6F3w9SKLZv/bEyHsgKcUtM VKieMwbZMHx60hPw5mjyyMiHZ++nS/zv7VHixuQuljVC9iWsqznn85NIU3TMv7zP0Q69 EKSfa8w77vvfBkbk7+FM87bSpjVZLQHpjsUg8kKdU+SXJ8Eynj3ezQP6ymwem5XbsPUg 92ZQ== MIME-Version: 1.0 Received: by 10.50.188.199 with SMTP id gc7mr4196283igc.4.1354754263201; Wed, 05 Dec 2012 16:37:43 -0800 (PST) Received: by 10.231.152.67 with HTTP; Wed, 5 Dec 2012 16:37:43 -0800 (PST) In-Reply-To: <1354753217.4156.YahooMailNeo@web140602.mail.bf1.yahoo.com> References: <1354753217.4156.YahooMailNeo@web140602.mail.bf1.yahoo.com> Date: Wed, 5 Dec 2012 16:37:43 -0800 Message-ID: Subject: Re: .META. region server DDOSed by too many clients From: Varun Sharma To: user@hbase.apache.org, lars hofhansl Content-Type: multipart/alternative; boundary=14dae93406f39db62e04d0244ee2 X-Gm-Message-State: ALoCoQmJADihpcOXRDqninY7404jUX4KNrM7uk2HDnfkCEsvfsVa7h2X/PApaOZvZHNxeyFFrGBO X-Virus-Checked: Checked by ClamAV on apache.org --14dae93406f39db62e04d0244ee2 Content-Type: text/plain; charset=ISO-8859-1 I see but is this pointing to the fact that we are heading to disk for scanning META - if yes, that would be pretty bad, no ? Currently I am trying to see if the freeze coincides with Block Cache being full (we have an inmemory column) - is the META table cached just like other tables ? Varun On Wed, Dec 5, 2012 at 4:20 PM, lars hofhansl wrote: > Looks like you're running into HBASE-5898. > > > > ----- Original Message ----- > From: Varun Sharma > To: user@hbase.apache.org > Cc: > Sent: Wednesday, December 5, 2012 3:51 PM > Subject: .META. region server DDOSed by too many clients > > Hi, > > I am running hbase 0.94.0 and I have a significant write load being put on > a table with 98 regions on a 15 node cluster - also this write load comes > from a very large number of clients (~ 1000). I am running with 10 priority > IPC handlers and 200 IPC handlers. It seems the region server holding .META > is DDOSed. All the 200 handlers are busy serving the .META. region and they > are all locked onto on object. The Jstack is here for the regoin server > > "IPC Server handler 182 on 60020" daemon prio=10 tid=0x00007f329872c800 > nid=0x4401 waiting on condition [0x00007f328807f000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000542d72e30> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871) > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201) > at > > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) > at > > java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:445) > at > > java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:925) > at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:71) > at > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:290) > at > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:213) > at > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:455) > at > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) > at > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) > at > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) > at > > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) > at > > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299) > at > > org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244) > at > > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) > - locked <0x000000063b4965d0> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) > - locked <0x000000063b4965d0> (a > org.apache.hadoop.hbase.regionserver.StoreScanner) > at > > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) > at > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3354) > at > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3310) > - locked <0x0000000523c211e0> (a > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3327) > - locked <0x0000000523c211e0> (a > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at > org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4066) > at > org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4039) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1941) > > The client side trace shows that we are looking for META region. > > thrift-worker-3499" daemon prio=10 tid=0x00007f789dd98800 nid=0xb52 waiting > for monitor entry [0x00007f778672d000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:943) > - waiting to lock <0x0000000707978298> (a java.lang.Object) > at > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836) > at > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482) > at > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367) > at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:729) > - locked <0x000000070821d5a0> (a > org.apache.hadoop.hbase.client.HTable) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:698) > at > > org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:371) > > On the RS page, I see 68 million read requests for the META region while > for the other 98 regions - we have done like 20 million write requests in > total - regions have not moved around at all and no crashes have happened. > Why do we have such an incredible number of scans over META and is there > something I can do about this issue ? > > Varun > > --14dae93406f39db62e04d0244ee2--