hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: .META. region server DDOSed by too many clients
Date Thu, 06 Dec 2012 11:52:39 GMT
Hmmm..yes....If you see the interactions over the JIRA Stack felt there was
something else there.  Check your Hadoop side logs and thread dumps.. That
may tel you if your datanodes are bit lazy. :)

Regards
Ram

On Thu, Dec 6, 2012 at 4:59 PM, Varun Sharma <varun@pinterest.com> wrote:

> I see - I am going to try the patch then - looks like all the threads have
> deadlocked and are holding the lock to the same block integer. The cache
> hit ratio is pretty high. Also the server is in this state for the past 1
> hour - I dont think it should take an hour to load one HDFS block - I am
> seeing the issue repeatedly - it looks like something is probably wrong
> with the locking mechanism when you have have higher number of IPC handlers
> like 200.
>
> On Thu, Dec 6, 2012 at 2:59 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > Actually when we observed that our block cache was OFF... If possible try
> > applying your patch and see what is happening?
> > If you have more memory just trying increasing the ratio allocated to
> block
> > cache?
> >
> > Regards
> > Ralm
> >
> > On Thu, Dec 6, 2012 at 4:02 PM, Varun Sharma <varun@pinterest.com>
> wrote:
> >
> > > Hi Ram,
> > >
> > > Yes BlockCache is on but there is another in memory column which might
> be
> > > preempting the stuff from block cache. So, we might be hitting more
> disk
> > > seeks - I see that you have seen this trace before on HBASE 5898 - did
> > that
> > > issue resolve things for you ?
> > >
> > > Thanks
> > > Varun
> > >
> > > On Wed, Dec 5, 2012 at 10:04 PM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Is block cache ON?  Check out HBASe-5898?
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Thu, Dec 6, 2012 at 9:55 AM, Anoop Sam John <anoopsj@huawei.com>
> > > wrote:
> > > >
> > > > >
> > > > > >is the META table cached just like other tables
> > > > > Yes Varun I think so.
> > > > >
> > > > > -Anoop-
> > > > > ________________________________________
> > > > > From: Varun Sharma [varun@pinterest.com]
> > > > > Sent: Thursday, December 06, 2012 6:10 AM
> > > > > To: user@hbase.apache.org; lars hofhansl
> > > > > Subject: Re: .META. region server DDOSed by too many clients
> > > > >
> > > > > We only see this on the .META. region not otherwise...
> > > > >
> > > > > On Wed, Dec 5, 2012 at 4:37 PM, Varun Sharma <varun@pinterest.com>
> > > > wrote:
> > > > >
> > > > > > I see but is this pointing to the fact that we are heading to
> disk
> > > for
> > > > > > scanning META - if yes, that would be pretty bad, no ? Currently
> I
> > am
> > > > > > trying to see if the freeze coincides with Block Cache being
full
> > (we
> > > > > have
> > > > > > an inmemory column) - is the META table cached just like other
> > > tables ?
> > > > > >
> > > > > > Varun
> > > > > >
> > > > > >
> > > > > > On Wed, Dec 5, 2012 at 4:20 PM, lars hofhansl <
> lhofhansl@yahoo.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> Looks like you're running into HBASE-5898.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> ----- Original Message -----
> > > > > >> From: Varun Sharma <varun@pinterest.com>
> > > > > >> To: user@hbase.apache.org
> > > > > >> Cc:
> > > > > >> Sent: Wednesday, December 5, 2012 3:51 PM
> > > > > >> Subject: .META. region server DDOSed by too many clients
> > > > > >>
> > > > > >> Hi,
> > > > > >>
> > > > > >> I am running hbase 0.94.0 and I have a significant write
load
> > being
> > > > put
> > > > > on
> > > > > >> a table with 98 regions on a 15 node cluster - also this
write
> > load
> > > > > comes
> > > > > >> from a very large number of clients (~ 1000). I am running
with
> 10
> > > > > >> priority
> > > > > >> IPC handlers and 200 IPC handlers. It seems the region server
> > > holding
> > > > > >> .META
> > > > > >> is DDOSed. All the 200 handlers are busy serving the .META.
> region
> > > and
> > > > > >> they
> > > > > >> are all locked onto on object. The Jstack is here for the
regoin
> > > > server
> > > > > >>
> > > > > >> "IPC Server handler 182 on 60020" daemon prio=10
> > > > tid=0x00007f329872c800
> > > > > >> nid=0x4401 waiting on condition [0x00007f328807f000]
> > > > > >>    java.lang.Thread.State: WAITING (parking)
> > > > > >>         at sun.misc.Unsafe.park(Native Method)
> > > > > >>         - parking to wait for  <0x0000000542d72e30>
(a
> > > > > >> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> > > > > >>         at
> > > > > >>
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> > > > > >>         at
> > > > > >>
> > > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:445)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:925)
> > > > > >>         at
> > > > > >> org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:71)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:290)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:213)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:455)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
> > > > > >>         - locked <0x000000063b4965d0> (a
> > > > > >> org.apache.hadoop.hbase.regionserver.StoreScanner)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
> > > > > >>         - locked <0x000000063b4965d0> (a
> > > > > >> org.apache.hadoop.hbase.regionserver.StoreScanner)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3354)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3310)
> > > > > >>         - locked <0x0000000523c211e0> (a
> > > > > >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3327)
> > > > > >>         - locked <0x0000000523c211e0> (a
> > > > > >> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
> > > > > >>         at
> > > > > >>
> > org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4066)
> > > > > >>         at
> > > > > >>
> > org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4039)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1941)
> > > > > >>
> > > > > >> The client side trace shows that we are looking for META
region.
> > > > > >>
> > > > > >> thrift-worker-3499" daemon prio=10 tid=0x00007f789dd98800
> > nid=0xb52
> > > > > >> waiting
> > > > > >> for monitor entry [0x00007f778672d000]
> > > > > >>    java.lang.Thread.State: BLOCKED (on object monitor)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:943)
> > > > > >>         - waiting to lock <0x0000000707978298> (a
> > java.lang.Object)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367)
> > > > > >>         at
> > > > org.apache.hadoop.hbase.client.HTable.batch(HTable.java:729)
> > > > > >>         - locked <0x000000070821d5a0> (a
> > > > > >> org.apache.hadoop.hbase.client.HTable)
> > > > > >>         at
> > > org.apache.hadoop.hbase.client.HTable.get(HTable.java:698)
> > > > > >>         at
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:371)
> > > > > >>
> > > > > >> On the RS page, I see 68 million read requests for the META
> region
> > > > while
> > > > > >> for the other 98 regions - we have done like 20 million
write
> > > requests
> > > > > in
> > > > > >> total - regions have not moved around at all and no crashes
have
> > > > > happened.
> > > > > >> Why do we have such an incredible number of scans over META
and
> is
> > > > there
> > > > > >> something I can do about this issue ?
> > > > > >>
> > > > > >> Varun
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message