hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: HBase read performance
Date Tue, 14 Oct 2014 02:12:13 GMT
This looks like something to track.
If there's an easy way to reproduce this, could you file a jira with instructions?
(Although from the conversations here, it looks like it's not trivial to reproduce).

-- Lars



________________________________
 From: Khaled Elmeleegy <kdiaa@hotmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Monday, October 13, 2014 12:13 PM
Subject: RE: HBase read performance
 

I went through a lengthy testing process with Ted. I tried both disabling just cache on write and both COW and BC. Disabling them makes the problem go away. I think disabling COW in my workload effectively doesn't exercise  the block cache as my workload is mostly writes, so disabling COW doesn't cause the cache blocks to churn.
I tried both overcommitting memory and not. This is irrelevant though. The cache operates on the virtual memory offered to it and it's agnostic to the physical memory size. Memory over commitment may only hurt performance due to swapping by the OS, but this has nothing to do with the bug.
Khaled

> Date: Sun, 12 Oct 2014 23:17:24 -0700
> Subject: Re: HBase read performance
> From: stack@duboce.net
> To: user@hbase.apache.org
> 
> On Sun, Oct 12, 2014 at 8:58 PM, Khaled Elmeleegy <kdiaa@hotmail.com> wrote:
> 
> > It goes away, but i think that's because i am basically churning the cache
> > far less. My workload is mostly writes.
> >
> >
> What exactly did you change?
> 
> Did you disable both BC and cacheonwrite or just one of these configs?
> 
> Disabling BC and/or cacheonwrite would churn your cache more?
> 
> What about the heap config?  Did you over commit mem?  If so, did you try
> doing a direct memory sizing that was w/i the bounds of physical memory
> (well within)?  Did you still see exceptions?
> 
> St.Ack
> 
> 
> 
> 
> > > Date: Sun, 12 Oct 2014 19:43:16 -0700
> > > Subject: Re: HBase read performance
> > > From: yuzhihong@gmail.com
> > > To: user@hbase.apache.org
> > >
> > > Hi,
> > > Can you turn off cacheonwrite and keep bucket cache to see if the problem
> > > goes away ?
> > >
> > > Cheers
> > >
> > > On Fri, Oct 10, 2014 at 10:59 AM, Khaled Elmeleegy <kdiaa@hotmail.com>
> > > wrote:
> > >
> > > > Yes, I can reproduce it with some work.
> > > > The workload is basically as follows:
> > > > There are writers streaming writes to a table. Then, there is a reader
> > > > (invoked via a web interface). The reader does a 1000 parallel reverse
> > > > scans, all end up hitting the same region in my case. The scans are
> > > > effectively "gets" as I just need to get one record off of each of
> > them. I
> > > > just need to do a "reverse" get, which is not supported (would be
> > great to
> > > > have :)), so I do it via reverse scan. After few tries, the reader
> > > > consistently hits this bug.
> > > >
> > > > This happens with these config changes:
> > > > hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G -XX:MaxDirectMemorySize=5G
> > > > -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts -verbose:gc
> > > > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog
> > > > gc:/tmp/hbase-regionserver-gc.loghbase-site:
> > > > hbase.bucketcache.ioengine=offheap
> > > > hbase.bucketcache.size=4196
> > > > hbase.rs.cacheblocksonwrite=true
> > > > hfile.block.index.cacheonwrite=true
> > > > hfile.block.bloom.cacheonwrite=true
> > > >
> > > > Interestingly, without these config changes, I can't reproduce the
> > problem.
> > > > Khaled
> > > >
> > > >
> > > > > Date: Fri, 10 Oct 2014 10:05:14 -0700
> > > > > Subject: Re: HBase read performance
> > > > > From: stack@duboce.net
> > > > > To: user@hbase.apache.org
> > > > >
> > > > > It looks like we are messing up our positioning math:
> > > > >
> > > > > 233 <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#233
> > > > >
> > > > >
> > > > > <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#
> > > > >
> > > > >
> > > > >     public final Buffer
> > > > > <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#Buffer
> > > > >
> > > > >  <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#
> > > > >position(int
> > > > > newPosition) {
> > > > >
> > > > > 234 <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#234
> > > > >
> > > > >
> > > > > <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#
> > > > >
> > > > >
> > > > >         if ((newPosition > limit
> > > > > <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#Buffer.0limit
> > > > >)
> > > > > || (newPosition < 0))
> > > > >
> > > > > 235 <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#235
> > > > >
> > > > >
> > > > > <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#
> > > > >
> > > > >
> > > > >             throw new IllegalArgumentException
> > > > > <
> > > >
> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/IllegalArgumentException.java#IllegalArgumentException
> > > > >();
> > > > >
> > > > >
> > > > > Is it easy to reproduce Khaled?  Always same region/store or spread
> > > > > across all reads?
> > > > >
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 10, 2014 at 8:31 AM, Khaled Elmeleegy <kdiaa@hotmail.com
> > >
> > > > wrote:
> > > > >
> > > > > > Andrew thanks. I think this indeed was the problem.
> > > > > > To get over it, I increased the amount of memory given the region
> > > > server
> > > > > > to avoid IO on reads. I used the configs below. In my experiments,
> > I
> > > > have
> > > > > > writers streaming their output to HBase. The reader powers a web
> > page
> > > > and
> > > > > > does this scatter/gather, where it reads 1000 keys written last and
> > > > passes
> > > > > > them the the front end. With this workload, I get the exception
> > below
> > > > at
> > > > > > the region server. Again, I am using HBAse (0.98.6.1). Any help is
> > > > > > appreciated.
> > > > > > hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G -XX:MaxDirectMemorySize=5G
> > > > > > -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts
> > -verbose:gc
> > > > > > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog
> > > > > > gc:/tmp/hbase-regionserver-gc.log
> > > > > > hbase-site:hbase.bucketcache.ioengine=offheap
> > > > > >
> > > > > >                                    hfile.block.cache.size=0.4
> > > > > >
> > > > > >
> > > > > >  hbase.bucketcache.size=4196
> > > > > >
> > > > > >                         hbase.rs.cacheblocksonwrite=true
> > > > > >
> > > > > >
> > > > > >  hfile.block.index.cacheonwrite=true
> > > > > >
> > > > > >                         hfile.block.bloom.cacheonwrite=true
> > > > > >
> > > > > > 2014-10-10 15:06:44,173 ERROR
> > > > > > [B.DefaultRpcServer.handler=62,queue=2,port=60020] ipc.RpcServer:
> > > > > > Unexpected throwable object
> > > > > > java.lang.IllegalArgumentException
> > > > > >         at java.nio.Buffer.position(Buffer.java:236)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.util.ByteBufferUtils.skip(ByteBufferUtils.java:434)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:849)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:760)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:248)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:176)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1780)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3758)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1950)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1936)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1913)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3157)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
> > > > > >         at
> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
> > > > > >         at
> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> > > > > >         at
> > > > > >
> > > >
> > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> > > > > >         at
> > > > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> > > > > >         at java.lang.Thread.run(Thread.java:744)
> > > > > > > From: apurtell@apache.org
> > > > > > > Date: Tue, 7 Oct 2014 17:09:35 -0700
> > > > > > > Subject: Re: HBase read performance
> > > > > > > To: user@hbase.apache.org
> > > > > > >
> > > > > > > > The cluster has 2 m1.large nodes.
> > > > > > >
> > > > > > > That's the problem right there.
> > > > > > >
> > > > > > > You need to look at c3.4xlarge or i2 instances as a minimum
> > > > requirement.
> > > > > > M1
> > > > > > > and even M3 instance types have ridiculously poor IO.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Oct 7, 2014 at 3:01 PM, Khaled Elmeleegy <
> > kdiaa@hotmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Nicolas, Qiang.
> > > > > > > >
> > > > > > > > I was able to write a simple program that reproduces the
> > problem
> > > > on a
> > > > > > tiny
> > > > > > > > HBase cluster on ec2. The cluster has 2 m1.large nodes. One
> > node
> > > > runs
> > > > > > the
> > > > > > > > master, name node and zookeeper. The other node runs a data
> > node
> > > > and a
> > > > > > > > region server, with heap size configured to be 6GB. There, the
> > 1000
> > > > > > > > parallel reverse gets (reverse scans) take 7-8 seconds. The
> > data
> > > > set is
> > > > > > > > tiny (10M records, each having a small number of bytes). As I
> > said
> > > > > > before,
> > > > > > > > all hardware resources are very idle there.
> > > > > > > >
> > > > > > > > Interestingly, running the same workload on my macbook, the
> > 1000
> > > > > > parallel
> > > > > > > > gets take ~200ms on a pseudo-distributed installation.
> > > > > > > >
> > > > > > > > Any help to resolve this mystery is highly appreciated.
> > > > > > > >
> > > > > > > > P.S. please find my test program attached.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Khaled
> > > > > > > >
> > > > > > > > > From: nkeywal@gmail.com
> > > > > > > > > Date: Mon, 6 Oct 2014 09:40:48 +0200
> > > > > > > > > Subject: Re: HBase read performance
> > > > > > > > > To: user@hbase.apache.org
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I haven't seen it mentioned, but if I understand correctly
> > each
> > > > scan
> > > > > > > > > returns a single row? If so you should use Scan#setSmall to
> > save
> > > > > > some rpc
> > > > > > > > > calls.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > >
> > > > > > > > > Nicolas
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Oct 5, 2014 at 11:28 AM, Qiang Tian <
> > tianq01@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > when using separate HConnection instance, both its
> > > > > > > > > > RpcClient instance(maintain connection to a regionserver)
> > and
> > > > > > Registry
> > > > > > > > > > instance(maintain connection to zookeeper) will be
> > separate..
> > > > > > > > > >
> > > > > > > > > > see
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > http://shammijayasinghe.blogspot.com/2012/02/zookeeper-increase-maximum-number-of.html
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Oct 5, 2014 at 2:24 PM, Khaled Elmeleegy <
> > > > > > kdiaa@hotmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I tried creating my own HConnections pool to use for my
> > HBase
> > > > > > calls,
> > > > > > > > so
> > > > > > > > > > > that not all the (2K) threads share the same HConnection.
> > > > > > However, I
> > > > > > > > > > could
> > > > > > > > > > > only have 10 HConnections. Beyond that I get ZK
> > exceptions,
> > > > > > please
> > > > > > > > find
> > > > > > > > > > it
> > > > > > > > > > > below. Also, with 10 HConnections, I don't see noticeable
> > > > > > > > improvement in
> > > > > > > > > > > performance so far.
> > > > > > > > > > > 2014-10-05 06:11:26,490 WARN [main]
> > > > > > zookeeper.RecoverableZooKeeper
> > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
> > > > > > transient
> > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181,
> > > > > > > > > > >
> > > > > > > >
> > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > > > > > > > > > KeeperErrorCode = ConnectionLoss for
> > /hbase/hbaseid2014-10-05
> > > > > > > > > > 06:11:26,490
> > > > > > > > > > > INFO [main] util.RetryCounter
> > > > > > > > > > (RetryCounter.java:sleepUntilNextRetry(155))
> > > > > > > > > > > - Sleeping 1000ms before retry #0...2014-10-05
> > 06:11:27,845
> > > > WARN
> > > > > > > > [main]
> > > > > > > > > > > zookeeper.RecoverableZooKeeper
> > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
> > > > > > transient
> > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181,
> > > > > > > > > > >
> > > > > > > >
> > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > > > > > > > > > KeeperErrorCode = ConnectionLoss for
> > /hbase/hbaseid2014-10-05
> > > > > > > > > > 06:11:27,849
> > > > > > > > > > > INFO [main] util.RetryCounter
> > > > > > > > > > (RetryCounter.java:sleepUntilNextRetry(155))
> > > > > > > > > > > - Sleeping 2000ms before retry #1...2014-10-05
> > 06:11:30,405
> > > > WARN
> > > > > > > > [main]
> > > > > > > > > > > zookeeper.RecoverableZooKeeper
> > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
> > > > > > transient
> > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181,
> > > > > > > > > > >
> > > > > > > >
> > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > > > > > > > > > KeeperErrorCode = ConnectionLoss for
> > /hbase/hbaseid2014-10-05
> > > > > > > > > > 06:11:30,405
> > > > > > > > > > > INFO [main] util.RetryCounter
> > > > > > > > > > (RetryCounter.java:sleepUntilNextRetry(155))
> > > > > > > > > > > - Sleeping 4000ms before retry #2...2014-10-05
> > 06:11:35,278
> > > > WARN
> > > > > > > > [main]
> > > > > > > > > > > zookeeper.RecoverableZooKeeper
> > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
> > > > > > transient
> > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181,
> > > > > > > > > > >
> > > > > > > >
> > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > > > > > > > > > KeeperErrorCode = ConnectionLoss for
> > /hbase/hbaseid2014-10-05
> > > > > > > > > > 06:11:35,279
> > > > > > > > > > > INFO [main] util.RetryCounter
> > > > > > > > > > (RetryCounter.java:sleepUntilNextRetry(155))
> > > > > > > > > > > - Sleeping 8000ms before retry #3...2014-10-05
> > 06:11:44,393
> > > > WARN
> > > > > > > > [main]
> > > > > > > > > > > zookeeper.RecoverableZooKeeper
> > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
> > > > > > transient
> > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181,
> > > > > > > > > > >
> > > > > > > >
> > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > > > > > > > > > KeeperErrorCode = ConnectionLoss for
> > /hbase/hbaseid2014-10-05
> > > > > > > > > > 06:11:44,393
> > > > > > > > > > > ERROR [main] zookeeper.RecoverableZooKeeper
> > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(255)) - ZooKeeper
> > > > exists
> > > > > > > > failed
> > > > > > > > > > > after 4 attempts2014-10-05 06:11:44,394 WARN [main]
> > > > > > zookeeper.ZKUtil
> > > > > > > > > > > (ZKUtil.java:checkExists(482)) - hconnection-0x4e174f3b,
> > > > quorum=
> > > > > > > > > > > 54.68.206.252:2181, baseZNode=/hbase Unable to set
> > watcher
> > > > on
> > > > > > znode
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > (/hbase/hbaseid)org.apache.zookeeper.KeeperException$ConnectionLossException:
> > > > > > > > > > > KeeperErrorCode = ConnectionLoss for /hbase/hbaseid at
> > > > > > > > > > >
> > > > > >
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> > > > > > > > at
> > > > > > > > > > >
> > > > > >
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> > > > > > > > at
> > > > > > > > > > >
> > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
> > > > > > > > > > > at
> > > > > > > >
> > > > org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:479)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:857)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:662)
> > > > > > > > > > > at
> > > > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > > > > > > > > > > Method) at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> > > > > > > > > > > at
> > > > > > java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:414)
> > > > > > > > > > > at
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:335)...
> > > > > > > > > > > > From: kdiaa@hotmail.com
> > > > > > > > > > > > To: user@hbase.apache.org
> > > > > > > > > > > > Subject: RE: HBase read performance
> > > > > > > > > > > > Date: Fri, 3 Oct 2014 12:37:39 -0700
> > > > > > > > > > > >
> > > > > > > > > > > > Lars, Ted, and Qiang,
> > > > > > > > > > > > Thanks for all the input.
> > > > > > > > > > > > Qiang: yes all the threads are in the same client
> > process
> > > > > > sharing
> > > > > > > > the
> > > > > > > > > > > same connection. And since I don't see hardware
> > contention,
> > > > may
> > > > > > be
> > > > > > > > there
> > > > > > > > > > is
> > > > > > > > > > > contention over this code path. I'll try using many
> > > > connections
> > > > > > and
> > > > > > > > see
> > > > > > > > > > if
> > > > > > > > > > > it alleviates the problems and I'll report back.
> > > > > > > > > > > > Thanks again,Khaled
> > > > > > > > > > > >
> > > > > > > > > > > > > Date: Fri, 3 Oct 2014 15:18:30 +0800
> > > > > > > > > > > > > Subject: Re: HBase read performance
> > > > > > > > > > > > > From: tianq01@gmail.com
> > > > > > > > > > > > > To: user@hbase.apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding to profiling, Andrew introduced
> > > > > > > > > > > > >
> > > > > > > >
> > http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
> > > > > > > > > > > months
> > > > > > > > > > > > > ago.
> > > > > > > > > > > > >
> > > > > > > > > > > > > processCallTime comes from RpcServer#call, so it
> > looks
> > > > good?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have a suspect:
> > > > > > > > https://issues.apache.org/jira/browse/HBASE-11306
> > > > > > > > > > > > >
> > > > > > > > > > > > > how many processes do you have for your 2000 threads?
> > > > > > > > > > > > > if olny 1 process, those threads will share just 1
> > > > > > connection to
> > > > > > > > that
> > > > > > > > > > > > > regionserver, there might be big contention on the
> > RPC
> > > > code
> > > > > > path.
> > > > > > > > > > > ---for
> > > > > > > > > > > > > such case, could you try using different connections?
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Oct 3, 2014 at 9:55 AM, Ted Yu <
> > > > yuzhihong@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Khaled:
> > > > > > > > > > > > > > Do you have profiler such as jprofiler ?
> > > > > > > > > > > > > > Profiling would give us more hint.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Otherwise capturing stack trace during the period
> > of
> > > > > > reverse
> > > > > > > > scan
> > > > > > > > > > > would
> > > > > > > > > > > > > > help.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Oct 2, 2014 at 4:52 PM, lars hofhansl <
> > > > > > > > larsh@apache.org>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You might have the data in the OS buffer cache,
> > > > without
> > > > > > short
> > > > > > > > > > > circuit
> > > > > > > > > > > > > > > reading the region server has to request the
> > block
> > > > from
> > > > > > the
> > > > > > > > data
> > > > > > > > > > > node
> > > > > > > > > > > > > > > process, which then reads it from the block
> > cache.
> > > > > > > > > > > > > > > That is a few context switches per RPC that do
> > not
> > > > show
> > > > > > up
> > > > > > > > in CPU
> > > > > > > > > > > > > > metrics.
> > > > > > > > > > > > > > > In that you also would not see disk IO.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If - as you say - you see a lot of evicted
> > blocks the
> > > > > > data
> > > > > > > > *has*
> > > > > > > > > > > to come
> > > > > > > > > > > > > > > from the OS. If you do not see disk IO is *has*
> > to
> > > > come
> > > > > > from
> > > > > > > > the
> > > > > > > > > > OS
> > > > > > > > > > > > > > cache.
> > > > > > > > > > > > > > > I.e. there's more RAM on your boxes, and you
> > should
> > > > > > increase
> > > > > > > > the
> > > > > > > > > > > heap
> > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > cache.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > You can measure the context switches with vmstat.
> > > > Other
> > > > > > than
> > > > > > > > that
> > > > > > > > > > > I have
> > > > > > > > > > > > > > > no suggestion until I reproduce the problem.
> > > > > > > > > > > > > > > Also check the data locality index of the region
> > > > server
> > > > > > it
> > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > > > > close
> > > > > > > > > > > > > > > to 100%.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -- Lars
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ________________________________
> > > > > > > > > > > > > > > From: Khaled Elmeleegy <kdiaa@hotmail.com>
> > > > > > > > > > > > > > > To: "user@hbase.apache.org" <
> > user@hbase.apache.org>
> > > > > > > > > > > > > > > Sent: Thursday, October 2, 2014 3:24 PM
> > > > > > > > > > > > > > > Subject: RE: HBase read performance
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lars thanks a lot about all the tips. I'll make
> > sure
> > > > I
> > > > > > cover
> > > > > > > > all
> > > > > > > > > > > of them
> > > > > > > > > > > > > > > and get back to you. I am not sure they are the
> > > > > > bottleneck
> > > > > > > > though
> > > > > > > > > > > as they
> > > > > > > > > > > > > > > all are about optimizing physical resource
> > usage. As
> > > > I
> > > > > > said,
> > > > > > > > I
> > > > > > > > > > > don't see
> > > > > > > > > > > > > > > any contended physical resources now. I'll also
> > try
> > > > to
> > > > > > > > reproduce
> > > > > > > > > > > this
> > > > > > > > > > > > > > > problem in a simpler environment and pass to you
> > the
> > > > test
> > > > > > > > program
> > > > > > > > > > > to play
> > > > > > > > > > > > > > > with.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Couple of high level points to make. You are
> > right
> > > > that
> > > > > > my
> > > > > > > > use
> > > > > > > > > > > case is
> > > > > > > > > > > > > > > kind of a worst case for HBase reads. But, if
> > things
> > > > go
> > > > > > the
> > > > > > > > way
> > > > > > > > > > you
> > > > > > > > > > > > > > > described them, there should be tons of disk IO
> > and
> > > > that
> > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > > > > > clearly
> > > > > > > > > > > > > > > the bottleneck. This is not the case though.
> > That's
> > > > for
> > > > > > the
> > > > > > > > > > simple
> > > > > > > > > > > reason
> > > > > > > > > > > > > > > that this is done in a test environment (I am
> > still
> > > > > > > > prototyping),
> > > > > > > > > > > and
> > > > > > > > > > > > > > not a
> > > > > > > > > > > > > > > lot of data is yet written to HBase. However for
> > the
> > > > > > real use
> > > > > > > > > > > case, there
> > > > > > > > > > > > > > > should writers constantly writing data to HBase
> > and
> > > > > > readers
> > > > > > > > > > > occasionally
> > > > > > > > > > > > > > > doing this scatter/gather. At steady state,
> > things
> > > > should
> > > > > > > > only
> > > > > > > > > > get
> > > > > > > > > > > worse
> > > > > > > > > > > > > > > and all the issues you mentioned should get far
> > more
> > > > > > > > pronounced.
> > > > > > > > > > > At this
> > > > > > > > > > > > > > > point, one can try to mitigate it using more
> > memory
> > > > or
> > > > > > so. I
> > > > > > > > am
> > > > > > > > > > > not there
> > > > > > > > > > > > > > > yet as I think I am hitting some software
> > bottleneck,
> > > > > > which I
> > > > > > > > > > > don't know
> > > > > > > > > > > > > > > how to work around.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Khaled
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ----------------------------------------
> > > > > > > > > > > > > > > > Date: Thu, 2 Oct 2014 14:20:47 -0700
> > > > > > > > > > > > > > > > From: larsh@apache.org
> > > > > > > > > > > > > > > > Subject: Re: HBase read performance
> > > > > > > > > > > > > > > > To: user@hbase.apache.org
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > OK... We might need to investigate this.
> > > > > > > > > > > > > > > > Any chance that you can provide a minimal test
> > > > program
> > > > > > and
> > > > > > > > > > > instruction
> > > > > > > > > > > > > > > about how to set it up.
> > > > > > > > > > > > > > > > We can do some profiling then.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > One thing to note is that with scanning HBase
> > > > cannot
> > > > > > use
> > > > > > > > bloom
> > > > > > > > > > > filters
> > > > > > > > > > > > > > > to rule out HFiles ahead of time, it needs to
> > look
> > > > into
> > > > > > all
> > > > > > > > of
> > > > > > > > > > > them.
> > > > > > > > > > > > > > > > So you kind of hit on the absolute worst case:
> > > > > > > > > > > > > > > > - random reads that do not fit into the block
> > cache
> > > > > > > > > > > > > > > > - cannot use bloom filters
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Few more question/comments:
> > > > > > > > > > > > > > > > - Do you have short circuit reading enabled? If
> > > > not,
> > > > > > you
> > > > > > > > > > should.
> > > > > > > > > > > > > > > > - Is your table major compacted? That will
> > reduce
> > > > the
> > > > > > > > number of
> > > > > > > > > > > files
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > look at.
> > > > > > > > > > > > > > > > - Did you disable Nagle's everywhere (enabled
> > > > > > tcpnodelay)?
> > > > > > > > It
> > > > > > > > > > > disabled
> > > > > > > > > > > > > > > by default in HBase, but necessarily in your
> > install
> > > > of
> > > > > > HDFS.
> > > > > > > > > > > > > > > > - Which version of HDFS are you using as
> > backing
> > > > > > > > filesystem?
> > > > > > > > > > > > > > > > - If your disk is idle, it means the data fits
> > into
> > > > > > the OS
> > > > > > > > > > buffer
> > > > > > > > > > > > > > cache.
> > > > > > > > > > > > > > > In turn that means that you increase the heap
> > for the
> > > > > > region
> > > > > > > > > > > servers. You
> > > > > > > > > > > > > > > can also use block encoding (FAST_DIFF) to try to
> > > > make
> > > > > > sure
> > > > > > > > the
> > > > > > > > > > > entire
> > > > > > > > > > > > > > > working set fits into the cache.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > - Also try to reduce the block size - although
> > if
> > > > your
> > > > > > > > overall
> > > > > > > > > > > working
> > > > > > > > > > > > > > > set does not fit in the heap it won't help much.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This is a good section of the book to read
> > through
> > > > > > > > generally
> > > > > > > > > > > (even
> > > > > > > > > > > > > > > though you might know most of this already):
> > > > > > > > > > > > > > >
> > > > http://hbase.apache.org/book.html#perf.configurations
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -- Lars
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > > > > > > From: Khaled Elmeleegy <kdiaa@hotmail.com>
> > > > > > > > > > > > > > > > To: "user@hbase.apache.org" <
> > user@hbase.apache.org
> > > > >
> > > > > > > > > > > > > > > > Cc:
> > > > > > > > > > > > > > > > Sent: Thursday, October 2, 2014 11:27 AM
> > > > > > > > > > > > > > > > Subject: RE: HBase read performance
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I do see a very brief spike in CPU
> > (user/system),
> > > > but
> > > > > > it's
> > > > > > > > no
> > > > > > > > > > > where
> > > > > > > > > > > > > > near
> > > > > > > > > > > > > > > 0% idle. It goes from 99% idle down to something
> > > > like 40%
> > > > > > > > idle
> > > > > > > > > > for
> > > > > > > > > > > a
> > > > > > > > > > > > > > second
> > > > > > > > > > > > > > > or so. The thing to note, this is all on a test
> > > > cluster,
> > > > > > so
> > > > > > > > no
> > > > > > > > > > > real load.
> > > > > > > > > > > > > > > Things are generally idle until i issue 2-3 of
> > these
> > > > > > > > > > > multi-scan-requests
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > render a web page. Then, you see the spike in the
> > > > cpu and
> > > > > > > > some
> > > > > > > > > > > activity
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the network and disk, but nowhere near
> > saturation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If there are specific tests you'd like me to
> > do to
> > > > > > debug
> > > > > > > > this,
> > > > > > > > > > > I'd be
> > > > > > > > > > > > > > > more than happy to do it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Khaled
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ----------------------------------------
> > > > > > > > > > > > > > > >> Date: Thu, 2 Oct 2014 11:15:59 -0700
> > > > > > > > > > > > > > > >> From: larsh@apache.org
> > > > > > > > > > > > > > > >> Subject: Re: HBase read performance
> > > > > > > > > > > > > > > >> To: user@hbase.apache.org
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> I still think you're waiting on disk. No
> > IOWAIT?
> > > > So
> > > > > > CPU
> > > > > > > > is not
> > > > > > > > > > > waiting
> > > > > > > > > > > > > > > a lot for IO. No high User/System CPU either?
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> If you see a lot of evicted block then each
> > RPC
> > > > has a
> > > > > > high
> > > > > > > > > > > chance of
> > > > > > > > > > > > > > > requiring to bring an entire 64k block in. You'll
> > > > see bad
> > > > > > > > > > > performance
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> We might need to trace this specific scenario.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> -- Lars
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> ________________________________
> > > > > > > > > > > > > > > >> From: Khaled Elmeleegy <kdiaa@hotmail.com>
> > > > > > > > > > > > > > > >> To: "user@hbase.apache.org" <
> > > > user@hbase.apache.org>
> > > > > > > > > > > > > > > >> Sent: Thursday, October 2, 2014 10:46 AM
> > > > > > > > > > > > > > > >> Subject: RE: HBase read performance
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> I've set the heap size to 6GB and I do have gc
> > > > > > logging. No
> > > > > > > > > > long
> > > > > > > > > > > pauses
> > > > > > > > > > > > > > > there -- occasional 0.1s or 0.2s.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Other than the discrepancy between what's
> > > > reported on
> > > > > > the
> > > > > > > > > > > client and
> > > > > > > > > > > > > > > what's reported at the RS, there is also the
> > issue
> > > > of not
> > > > > > > > getting
> > > > > > > > > > > proper
> > > > > > > > > > > > > > > concurrency. So, even if a reverse get takes
> > 100ms
> > > > or so
> > > > > > > > (this
> > > > > > > > > > has
> > > > > > > > > > > to be
> > > > > > > > > > > > > > > mostly blocking on various things as no physical
> > > > > > resource is
> > > > > > > > > > > contended),
> > > > > > > > > > > > > > > then the other gets/scans should be able to
> > proceed
> > > > in
> > > > > > > > parallel,
> > > > > > > > > > > so a
> > > > > > > > > > > > > > > thousand concurrent gets/scans should finish in
> > few
> > > > > > hundreds
> > > > > > > > of
> > > > > > > > > > ms
> > > > > > > > > > > not
> > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > seconds. That's why I thought I'd increase the
> > > > handlers
> > > > > > > > count to
> > > > > > > > > > > try to
> > > > > > > > > > > > > > get
> > > > > > > > > > > > > > > more concurrency, but it didn't help. So, there
> > must
> > > > be
> > > > > > > > something
> > > > > > > > > > > else.
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> Khaled
> > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >> ----------------------------------------
> > > > > > > > > > > > > > > >>> From: ndimiduk@gmail.com
> > > > > > > > > > > > > > > >>> Date: Thu, 2 Oct 2014 10:36:39 -0700
> > > > > > > > > > > > > > > >>> Subject: Re: HBase read performance
> > > > > > > > > > > > > > > >>> To: user@hbase.apache.org
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> Do check again on the heap size of the region
> > > > > > servers.
> > > > > > > > The
> > > > > > > > > > > default
> > > > > > > > > > > > > > > >>> unconfigured size is 1G; too small for much
> > of
> > > > > > anything.
> > > > > > > > > > Check
> > > > > > > > > > > your
> > > > > > > > > > > > > > RS
> > > > > > > > > > > > > > > logs
> > > > > > > > > > > > > > > >>> -- look for lines produced by the
> > JVMPauseMonitor
> > > > > > thread.
> > > > > > > > > > They
> > > > > > > > > > > > > > usually
> > > > > > > > > > > > > > > >>> correlate with long GC pauses or other
> > > > process-freeze
> > > > > > > > events.
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> Get is implemented as a Scan of a single
> > row, so
> > > > a
> > > > > > > > reverse
> > > > > > > > > > > scan of a
> > > > > > > > > > > > > > > single
> > > > > > > > > > > > > > > >>> row should be functionally equivalent.
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> In practice, I have seen discrepancy between
> > the
> > > > > > > > latencies
> > > > > > > > > > > reported
> > > > > > > > > > > > > > by
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >>> RS and the latencies experienced by the
> > client.
> > > > I've
> > > > > > not
> > > > > > > > > > > investigated
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > >>> area thoroughly.
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>> On Thu, Oct 2, 2014 at 10:05 AM, Khaled
> > > > Elmeleegy <
> > > > > > > > > > > kdiaa@hotmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >>>> Thanks Lars for your quick reply.
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > > >>>> Yes performance is similar with less
> > handlers (I
> > > > > > tried
> > > > > > > > with
> > > > > > > > > > > 100
> > > > > > > > > > > > > > > first).
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > > >>>> The payload is not big ~1KB or so. The
> > working
> > > > set
> > > > > > > > doesn't
> > > > > > > > > > > seem to
> > > > > > > > > > > > > > > fit in
> > > > > > > > > > > > > > > >>>> memory as there are many cache misses.
> > However,
> > > > > > disk is
> > > > > > > > far
> > > > > > > > > > > from
> > > > > > > > > > > > > > > being a
> > > > > > > > > > > > > > > >>>> bottleneck. I checked using iostat. I also
> > > > verified
> > > > > > that
> > > > > > > > > > > neither the
> > > > > > > > > > > > > > > >>>> network nor the CPU of the region server or
> > the
> > > > > > client
> > > > > > > > are a
> > > > > > > > > > > > > > > bottleneck.
> > > > > > > > > > > > > > > >>>> This leads me to believe that likely this
> > is a
> > > > > > software
> > > > > > > > > > > bottleneck,
> > > > > > > > > > > > > > > >>>> possibly due to a misconfiguration on my
> > side. I
> > > > > > just
> > > > > > > > don't
> > > > > > > > > > > know how
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > >>>> debug it. A clear disconnect I see is the
> > > > individual
> > > > > > > > request
> > > > > > > > > > > latency
> > > > > > > > > > > > > > > as
> > > > > > > > > > > > > > > >>>> reported by metrics on the region server
> > (IPC
> > > > > > > > > > processCallTime
> > > > > > > > > > > vs
> > > > > > > > > > > > > > > scanNext)
> > > > > > > > > > > > > > > >>>> vs what's measured on the client. Does this
> > > > sound
> > > > > > > > right? Any
> > > > > > > > > > > ideas
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > >>>> to better debug it?
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > > >>>> About this trick with the timestamps to be
> > able
> > > > to
> > > > > > do a
> > > > > > > > > > > forward
> > > > > > > > > > > > > > scan,
> > > > > > > > > > > > > > > >>>> thanks for pointing it out. Actually, I am
> > > > aware of
> > > > > > it.
> > > > > > > > The
> > > > > > > > > > > problem
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > >>>> is, sometimes I want to get the key after a
> > > > > > particular
> > > > > > > > > > > timestamp and
> > > > > > > > > > > > > > > >>>> sometimes I want to get the key before, so
> > just
> > > > > > relying
> > > > > > > > on
> > > > > > > > > > > the key
> > > > > > > > > > > > > > > order
> > > > > > > > > > > > > > > >>>> doesn't work. Ideally, I want a reverse
> > get(). I
> > > > > > thought
> > > > > > > > > > > reverse
> > > > > > > > > > > > > > scan
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > >>>> do the trick though.
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > > >>>> Khaled
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > > >>>> ----------------------------------------
> > > > > > > > > > > > > > > >>>>> Date: Thu, 2 Oct 2014 09:40:37 -0700
> > > > > > > > > > > > > > > >>>>> From: larsh@apache.org
> > > > > > > > > > > > > > > >>>>> Subject: Re: HBase read performance
> > > > > > > > > > > > > > > >>>>> To: user@hbase.apache.org
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> Hi Khaled,
> > > > > > > > > > > > > > > >>>>> is it the same with fewer threads? 1500
> > handler
> > > > > > threads
> > > > > > > > > > > seems to
> > > > > > > > > > > > > > be a
> > > > > > > > > > > > > > > >>>> lot. Typically a good number of threads
> > depends
> > > > on
> > > > > > the
> > > > > > > > > > > hardware
> > > > > > > > > > > > > > > (number of
> > > > > > > > > > > > > > > >>>> cores, number of spindles, etc). I cannot
> > think
> > > > of
> > > > > > any
> > > > > > > > type
> > > > > > > > > > of
> > > > > > > > > > > > > > > scenario
> > > > > > > > > > > > > > > >>>> where more than 100 would give any
> > improvement.
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> How large is the payload per KV retrieved
> > that
> > > > > > way? If
> > > > > > > > > > large
> > > > > > > > > > > (as
> > > > > > > > > > > > > > in a
> > > > > > > > > > > > > > > >>>> few 100k) you definitely want to lower the
> > > > number
> > > > > > of the
> > > > > > > > > > > handler
> > > > > > > > > > > > > > > threads.
> > > > > > > > > > > > > > > >>>>> How much heap do you give the region
> > server?
> > > > Does
> > > > > > the
> > > > > > > > > > > working set
> > > > > > > > > > > > > > fit
> > > > > > > > > > > > > > > >>>> into the cache? (i.e. in the metrics, do you
> > > > see the
> > > > > > > > > > eviction
> > > > > > > > > > > count
> > > > > > > > > > > > > > > going
> > > > > > > > > > > > > > > >>>> up, if so it does not fit into the cache).
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> If the working set does not fit into the
> > cache
> > > > > > > > (eviction
> > > > > > > > > > > count goes
> > > > > > > > > > > > > > > up)
> > > > > > > > > > > > > > > >>>> then HBase will need to bring a new block in
> > > > from
> > > > > > disk
> > > > > > > > on
> > > > > > > > > > > each Get
> > > > > > > > > > > > > > > >>>> (assuming the Gets are more or less random
> > as
> > > > far
> > > > > > as the
> > > > > > > > > > > server is
> > > > > > > > > > > > > > > >>>> concerned).
> > > > > > > > > > > > > > > >>>>> In case you'll benefit from reducing the
> > HFile
> > > > > > block
> > > > > > > > size
> > > > > > > > > > > (from 64k
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > >>>> 8k or even 4k).
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> Lastly I don't think we tested the
> > performance
> > > > of
> > > > > > using
> > > > > > > > > > > reverse
> > > > > > > > > > > > > > scan
> > > > > > > > > > > > > > > >>>> this way, there is probably room to optimize
> > > > this.
> > > > > > > > > > > > > > > >>>>> Can you restructure your keys to allow
> > forwards
> > > > > > > > scanning?
> > > > > > > > > > For
> > > > > > > > > > > > > > example
> > > > > > > > > > > > > > > >>>> you could store the time as MAX_LONG-time.
> > Or
> > > > you
> > > > > > could
> > > > > > > > > > > invert all
> > > > > > > > > > > > > > > the bits
> > > > > > > > > > > > > > > >>>> of the time portion of the key, so that it
> > sort
> > > > the
> > > > > > > > other
> > > > > > > > > > > way. Then
> > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > >>>> could do a forward scan.
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> Let us know how it goes.
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> -- Lars
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> ----- Original Message -----
> > > > > > > > > > > > > > > >>>>> From: Khaled Elmeleegy <kdiaa@hotmail.com>
> > > > > > > > > > > > > > > >>>>> To: "user@hbase.apache.org" <
> > > > user@hbase.apache.org
> > > > > > >
> > > > > > > > > > > > > > > >>>>> Cc:
> > > > > > > > > > > > > > > >>>>> Sent: Thursday, October 2, 2014 12:12 AM
> > > > > > > > > > > > > > > >>>>> Subject: HBase read performance
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> Hi,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> I am trying to do a scatter/gather on hbase
> > > > > > (0.98.6.1),
> > > > > > > > > > > where I
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > >>>> client reading ~1000 keys from an HBase
> > table.
> > > > These
> > > > > > > > keys
> > > > > > > > > > > happen to
> > > > > > > > > > > > > > > fall on
> > > > > > > > > > > > > > > >>>> the same region server. For my reads I use
> > > > reverse
> > > > > > scan
> > > > > > > > to
> > > > > > > > > > > read each
> > > > > > > > > > > > > > > key as
> > > > > > > > > > > > > > > >>>> I want the key prior to a specific time
> > stamp
> > > > (time
> > > > > > > > stamps
> > > > > > > > > > are
> > > > > > > > > > > > > > stored
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > >>>> reverse order). I don't believe gets can
> > > > accomplish
> > > > > > > > that,
> > > > > > > > > > > right? so
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > >>>> scan, with caching set to 1.
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> I use 2000 reader threads in the client
> > and on
> > > > > > HBase,
> > > > > > > > I've
> > > > > > > > > > > set
> > > > > > > > > > > > > > > >>>> hbase.regionserver.handler.count to 1500.
> > With
> > > > this
> > > > > > > > setup,
> > > > > > > > > > my
> > > > > > > > > > > > > > scatter
> > > > > > > > > > > > > > > >>>> gather is very slow and can take up to 10s
> > in
> > > > total.
> > > > > > > > Timing
> > > > > > > > > > an
> > > > > > > > > > > > > > > individual
> > > > > > > > > > > > > > > >>>> getScanner(..) call on the client side, it
> > can
> > > > > > easily
> > > > > > > > take
> > > > > > > > > > few
> > > > > > > > > > > > > > > hundreds of
> > > > > > > > > > > > > > > >>>> ms. I also got the following metrics from
> > the
> > > > region
> > > > > > > > server
> > > > > > > > > > in
> > > > > > > > > > > > > > > question:
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> "queueCallTime_mean" : 2.190855525775637,
> > > > > > > > > > > > > > > >>>>> "queueCallTime_median" : 0.0,
> > > > > > > > > > > > > > > >>>>> "queueCallTime_75th_percentile" : 0.0,
> > > > > > > > > > > > > > > >>>>> "queueCallTime_95th_percentile" : 1.0,
> > > > > > > > > > > > > > > >>>>> "queueCallTime_99th_percentile" :
> > > > > > 556.9799999999818,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> "processCallTime_min" : 0,
> > > > > > > > > > > > > > > >>>>> "processCallTime_max" : 12755,
> > > > > > > > > > > > > > > >>>>> "processCallTime_mean" :
> > 105.64873440912682,
> > > > > > > > > > > > > > > >>>>> "processCallTime_median" : 0.0,
> > > > > > > > > > > > > > > >>>>> "processCallTime_75th_percentile" : 2.0,
> > > > > > > > > > > > > > > >>>>> "processCallTime_95th_percentile" :
> > 7917.95,
> > > > > > > > > > > > > > > >>>>> "processCallTime_99th_percentile" :
> > 8876.89,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_min"
> > > > > > > > > > > > > > > >>>> : 89,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_max"
> > > > > > > > > > > > > > > >>>> : 11300,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_mean"
> > > > > > > > > > > > > > > >>>> : 654.4949739797315,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_median"
> > > > > > > > > > > > > > > >>>> : 101.0,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_75th_percentile"
> > > > > > > > > > > > > > > >>>> : 101.0,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_95th_percentile"
> > > > > > > > > > > > > > > >>>> : 101.0,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_99th_percentile"
> > > > > > > > > > > > > > > >>>> : 113.0,
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> Where "delta" is the name of the table I am
> > > > > > querying.
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> In addition to all this, i monitored the
> > > > hardware
> > > > > > > > resources
> > > > > > > > > > > (CPU,
> > > > > > > > > > > > > > > disk,
> > > > > > > > > > > > > > > >>>> and network) of both the client and the
> > region
> > > > > > server
> > > > > > > > and
> > > > > > > > > > > nothing
> > > > > > > > > > > > > > > seems
> > > > > > > > > > > > > > > >>>> anywhere near saturation. So I am puzzled by
> > > > what's
> > > > > > > > going on
> > > > > > > > > > > and
> > > > > > > > > > > > > > > where this
> > > > > > > > > > > > > > > >>>> time is going.
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> Few things to note based on the above
> > > > measurements:
> > > > > > > > both
> > > > > > > > > > > medians of
> > > > > > > > > > > > > > > IPC
> > > > > > > > > > > > > > > >>>> processCallTime and queueCallTime are
> > basically
> > > > zero
> > > > > > > > (ms I
> > > > > > > > > > > presume,
> > > > > > > > > > > > > > > >>>> right?). However, scanNext_median is 101 (ms
> > > > too,
> > > > > > > > right?). I
> > > > > > > > > > > am not
> > > > > > > > > > > > > > > sure
> > > > > > > > > > > > > > > >>>> how this adds up. Also, even though the 101
> > > > figure
> > > > > > seems
> > > > > > > > > > > > > > outrageously
> > > > > > > > > > > > > > > high
> > > > > > > > > > > > > > > >>>> and I don't know why, still all these scans
> > > > should
> > > > > > be
> > > > > > > > > > > happening in
> > > > > > > > > > > > > > > >>>> parallel, so the overall call should finish
> > > > fast,
> > > > > > given
> > > > > > > > that
> > > > > > > > > > > no
> > > > > > > > > > > > > > > hardware
> > > > > > > > > > > > > > > >>>> resource is contended, right? but this is
> > not
> > > > what's
> > > > > > > > > > > happening, so I
> > > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > >>>> to be missing something(s).
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> So, any help is appreciated there.
> > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > >>>>> Thanks,
> > > > > > > > > > > > > > > >>>>> Khaled
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > >
> > > > > > >    - Andy
> > > > > > >
> > > > > > > Problems worthy of attack prove their worth by hitting back. -
> > Piet
> > > > Hein
> > > > > > > (via Tom White)
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message