hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Baldassari <ja...@dataxu.com>
Subject Re: Optimizations for random read performance
Date Tue, 16 Feb 2010 06:05:00 GMT
I just released a new version of our client code that uses pooled HBase
clients.  In unit testing, using a client pool of size 10 increased
performance by about 40%.  In our production environment, however, it
made no difference at all.  We're still seeing the same behavior where
everything starts out fine, but then performance quickly degrades.  I
changed the block cache size across the board from 0.2 to 0.4.  This
seems to have improved the cache hit ratio slightly, but it didn't help
performance overall.  Applying HBASE-2180 isn't really an option at this
time because we've been told to stick with the Cloudera distro.

If I had to guess, I would say the performance issues start to happen
around the time the region servers hit max heap size, which occurs
within minutes of exposing the app to live traffic.  Could GC be killing
us?  We use the concurrent collector as suggested.  I saw on the
performance page some mention of limiting the size of the new generation
like -XX:NewSize=6m -XX:MaxNewSize=6m.  Is that worth trying?

Here are the new region server stats along with load averages:

Region Server 1:
request=0.0, regions=16, stores=16, storefiles=33, storefileIndexSize=4, memstoreSize=1, compactionQueueSize=0,
usedHeap=2891, maxHeap=4079, blockCacheSize=1403878072, blockCacheFree=307135816, blockCacheCount=21107,
blockCacheHitRatio=84, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
Load Averages: 10.34, 10.58, 7.08

Region Server 2:
request=0.0, regions=15, stores=16, storefiles=26, storefileIndexSize=3, memstoreSize=1, compactionQueueSize=0,
usedHeap=3257, maxHeap=4079, blockCacheSize=661765368, blockCacheFree=193741576, blockCacheCount=9942,
blockCacheHitRatio=77, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
Load Averages: 1.90, 1.23, 0.98

Region Server 3:
request=0.0, regions=16, stores=16, storefiles=41, storefileIndexSize=4, memstoreSize=4, compactionQueueSize=0,
usedHeap=1627, maxHeap=4079, blockCacheSize=665117184, blockCacheFree=190389760, blockCacheCount=9995,
blockCacheHitRatio=70, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
Load Averages: 2.01, 3.56, 4.18

That first region server is getting hit much harder than the others.
They're identical machines (8-core), and the distribution of keys should
be fairly random, so I'm not sure why that would happen.  Any other
ideas or suggestions would be greatly appreciated.

Thanks,
James


On Mon, 2010-02-15 at 21:51 -0600, Stack wrote:
> Yeah, I was going to say that if your loading is mostly read, you can
> probably go up from the 0.2 given over to cache.  I like Dan's
> suggestion of trying it first on one server, if you can.
> 
> St.Ack
> 
> On Mon, Feb 15, 2010 at 5:22 PM, Dan Washusen <dan@reactive.org> wrote:
> > So roughly 72% of reads use the blocks held in the block cache...
> >
> > It would be interesting to see the difference between when it was working OK
> > and now.  Could you try increasing the memory allocated to one of the
> > regions and also increasing the "hfile.block.cache.size" to say '0.4' on the
> > same region?
> >
> > On 16 February 2010 11:54, James Baldassari <james@dataxu.com> wrote:
> >
> >> Hi Dan.  Thanks for your suggestions.  I am doing writes at the same
> >> time as reads, but there are usually many more reads than writes.  Here
> >> are the stats for all three region servers:
> >>
> >> Region Server 1:
> >> request=0.0, regions=15, stores=16, storefiles=34, storefileIndexSize=3,
> >> memstoreSize=308, compactionQueueSize=0, usedHeap=3096, maxHeap=4079,
> >> blockCacheSize=705474544, blockCacheFree=150032400, blockCacheCount=10606,
> >> blockCacheHitRatio=76, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> >>
> >> Region Server 2:
> >> request=0.0, regions=16, stores=16, storefiles=39, storefileIndexSize=4,
> >> memstoreSize=225, compactionQueueSize=0, usedHeap=3380, maxHeap=4079,
> >> blockCacheSize=643172800, blockCacheFree=212334144, blockCacheCount=9660,
> >> blockCacheHitRatio=69, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> >>
> >> Region Server 3:
> >> request=0.0, regions=13, stores=13, storefiles=31, storefileIndexSize=4,
> >> memstoreSize=177, compactionQueueSize=0, usedHeap=1905, maxHeap=4079,
> >> blockCacheSize=682848608, blockCacheFree=172658336, blockCacheCount=10262,
> >> blockCacheHitRatio=72, fsReadLatency=0, fsWriteLatency=0, fsSyncLatency=0
> >>
> >> The average blockCacheHitRatio is about 72.  Is this too low?  Anything
> >> else I can check?
> >>
> >> -James
> >>
> >>
> >> On Mon, 2010-02-15 at 18:16 -0600, Dan Washusen wrote:
> >> > Maybe the block cache is thrashing?
> >> >
> >> > If you are regularly writing data to your tables then it's possible that
> >> the
> >> > block cache is no longer being effective.  On the region server web UI
> >> check
> >> > the blockCacheHitRatio value.  You want this value to be high (0 - 100).
> >>  If
> >> > this value is low it means that HBase has to go to disk to fetch blocks
> >> of
> >> > data.  You can control the amount of VM memory that HBase allocates to
> >> the
> >> > block cache using the "hfile.block.cache.size" property (default is 0.2
> >> > (20%)).
> >> >
> >> > Cheers,
> >> > Dan
> >> >
> >> > On 16 February 2010 10:45, James Baldassari <james@dataxu.com> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > Does anyone have any tips to share regarding optimization for random
> >> > > read performance?  For writes I've found that setting a large write
> >> > > buffer and setting auto-flush to false on the client side significantly
> >> > > improved put performance.  Are there any similar easy tweaks to improve
> >> > > random read performance?
> >> > >
> >> > > I'm using HBase 0.20.3 in a very read-heavy real-time system with
1
> >> > > master and 3 region servers.  It was working ok for a while, but today
> >> > > there was a severe degradation in read performance.  Restarting Hadoop
> >> > > and HBase didn't help, are there are no errors in the logs.  Read
> >> > > performance starts off around 1,000-2,000 gets/second but quickly
> >> > > (within minutes) drops to around 100 gets/second.
> >> > >
> >> > > I've already looked at the performance tuning wiki page.  On the server
> >> > > side I've increased hbase.regionserver.handler.count from 10 to 100,
> >> but
> >> > > it didn't help.  Maybe this is expected because I'm only using a single
> >> > > client to do reads.  I'm working on implementing a client pool now,
but
> >> > > I'm wondering if there are any other settings on the server or client
> >> > > side that might improve things.
> >> > >
> >> > > Thanks,
> >> > > James
> >> > >
> >> > >
> >> > >
> >>
> >>
> >


Mime
View raw message