hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Washusen <...@reactive.org>
Subject Re: Optimizations for random read performance
Date Tue, 16 Feb 2010 08:09:37 GMT
Adding another node would defer the problem a little longer but enabling LZO
would make the biggest difference!  The compression it achieved on my test
cluster stunned me!  I can't recall the exacts numbers but it was something
in the order of magnitude improvement in the file size.  If you apply it to
your table you won't have to worry about the block cache for a while... :)

I'm not how to apply LZO to an existing table.  One of the other users will
need to help you there...

On 16 February 2010 18:21, James Baldassari <james@dataxu.com> wrote:

> No, we don't have LZO on the table right now.  I guess that's something
> else that we can try.  I'll ask our ops team if we can steal another
> node or two for the cluster if you think that will help.  I'll report
> back with results as soon as I can.  Thanks again for working with me on
> this!  This is definitely the most responsive users list I've ever
> posted to.
>
> -James
>
>
> On Tue, 2010-02-16 at 01:11 -0600, Dan Washusen wrote:
> > You could just add another node to your cluster to solve the immediate
> > problem.  Then keep an eye on load, etc to preemptively add more nodes as
> > needed?
> >
> > Out of interest do you have LZO compression enabled on your table?  That
> > makes the block cache and IO ops much more effective...
> >
> > Regarding GC logging:
> > Also, another option for GC logging is 'jstat'.  For example, running the
> > following command will print out the VM heap utilization every 1 second:
> >
> > > jstat -gcutil <pid> 1000
> > >
> >
> > The last column shows total amount of time (in seconds) spent garbage
> > collecting.   You want to see very small increments...  The other
> > interesting columns are "O" and "E".  They show the percentage of Old and
> > Eden used.  If old gen is staying up in the high 90's then there are more
> > long lived objects then available memory...
> >
> > Cheers,
> > Dan
> >
> > On 16 February 2010 17:54, James Baldassari <james@dataxu.com> wrote:
> >
> > > How much should I give the region servers?  That machine is already
> > > overallocated, by which I mean that the sum of the max heap sizes of
> all
> > > java processes running there is greater than the amount physical
> memory,
> > > which can lead to swapping.  We have: Hadoop data node, Hadoop task
> > > tracker, ZooKeeper peer, and region server.  The machine has 8G of
> > > physical memory.  The region server currently has a max heap size of
> 4G.
> > > Should I increase to 6G?  Should I decrease the block cache back down
> to
> > > 20% or even lower?  Do we need to move to a 16G server?
> > >
> > > Thanks,
> > > James
> > >
> > >
> > > On Tue, 2010-02-16 at 00:48 -0600, Dan Washusen wrote:
> > > > 32% IO on region server 3!  Ouch! :)
> > > >
> > > > Increasing the block cache to 40% of VM memory without upping the
> total
> > > > available memory may only exacerbated the issue.  I notice that
> region
> > > > server 2 was already using 3300mb of the 4000mb heap. By increasing
> the
> > > > block cache size to 40% you have now given the block cache 1600mb
> > > compared
> > > > to the previous 800mb...
> > > >
> > > > Can you give the region servers more memory?
> > > >
> > > > Cheers,
> > > > Dan
> > > >
> > > > On 16 February 2010 17:42, James Baldassari <james@dataxu.com>
> wrote:
> > > >
> > > > > On Tue, 2010-02-16 at 00:14 -0600, Stack wrote:
> > > > > > On Mon, Feb 15, 2010 at 10:05 PM, James Baldassari <
> james@dataxu.com
> > > >
> > > > > wrote:
> > > > > > >  Applying HBASE-2180 isn't really an option at this
> > > > > > > time because we've been told to stick with the Cloudera
distro.
> > > > > >
> > > > > > I'm sure the wouldn't mind (smile).  Seems to about double
> > > throughput.
> > > > >
> > > > > Hmm, well I might be able to convince them ;)
> > > > >
> > > > > >
> > > > > >
> > > > > > > If I had to guess, I would say the performance issues start
to
> > > happen
> > > > > > > around the time the region servers hit max heap size, which
> occurs
> > > > > > > within minutes of exposing the app to live traffic.  Could
GC
> be
> > > > > killing
> > > > > > > us?  We use the concurrent collector as suggested.  I saw
on
> the
> > > > > > > performance page some mention of limiting the size of the
new
> > > > > generation
> > > > > > > like -XX:NewSize=6m -XX:MaxNewSize=6m.  Is that worth trying?
> > > > > >
> > > > > > Enable GC logging for a while?  See hbase-env.sh.  Uncomment
this
> > > line:
> > > > > >
> > > > > > # export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails
> > > > > > XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
> > > > >
> > > > > I did uncomment that line, but I can't figure out where the
> > > gc-hbase.log
> > > > > is.  It's not with the other logs.  When starting HBase the GC
> output
> > > > > seems to be going to stdout rather than the file.  Maybe a Cloudera
> > > > > thing.  I'll do some digging.
> > > > >
> > > > > >
> > > > > > You are using recent JVM?  1.6.0_10 or greater?  1.6.0_18 might
> have
> > > > > issues.
> > > > >
> > > > > We're on 1.6.0_16 at the moment.
> > > > >
> > > > > >
> > > > > > Whats CPU and iowait or wa in top look like on these machines,
> > > > > > particularly the loaded machine?
> > > > > >
> > > > > > How many disks in the machines?
> > > > >
> > > > > I'll have to ask our ops guys about the disks.  The high load has
> now
> > > > > switched from region server 1 to 3.  I just saw in our logs that
it
> > > took
> > > > > 139383.065 milliseconds to do 5000 gets, ~36 gets/second, ouch.
>  Here
> > > > > are the highlights from top for each region server:
> > > > >
> > > > > Region Server 1:
> > > > > top - 01:39:41 up 4 days, 13:44,  4 users,  load average: 1.89,
> 0.99,
> > > 1.19
> > > > > Tasks: 194 total,   1 running, 193 sleeping,   0 stopped,   0
> zombie
> > > > > Cpu(s): 15.6%us,  5.8%sy,  0.0%ni, 76.9%id,  0.0%wa,  0.1%hi,
>  1.6%si,
> > > > >  0.0%st
> > > > > Mem:   8166588k total,  8112812k used,    53776k free,     8832k
> > > buffers
> > > > > Swap:  1052248k total,      152k used,  1052096k free,  2831076k
> cached
> > > > >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > > > > 21961 hadoop    19   0 4830m 4.2g  10m S 114.3 53.6  37:26.58 java
> > > > > 21618 hadoop    21   0 4643m 578m 9804 S 66.1  7.3  19:06.89 java
> > > > >
> > > > > Region Server 2:
> > > > > top - 01:40:28 up 4 days, 13:43,  4 users,  load average: 3.93,
> 2.17,
> > > 1.39
> > > > > Tasks: 194 total,   1 running, 193 sleeping,   0 stopped,   0
> zombie
> > > > > Cpu(s): 11.3%us,  3.1%sy,  0.0%ni, 83.4%id,  1.2%wa,  0.1%hi,
>  0.9%si,
> > > > >  0.0%st
> > > > > Mem:   8166588k total,  7971572k used,   195016k free,    34972k
> > > buffers
> > > > > Swap:  1052248k total,      152k used,  1052096k free,  2944712k
> cached
> > > > >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > > > > 15752 hadoop    18   0 4742m 4.1g  10m S 210.6 53.1  41:52.80 java
> > > > > 15405 hadoop    20   0 4660m 317m 9800 S 114.0  4.0  27:34.17 java
> > > > >
> > > > > Region Server 3:
> > > > > top - 01:40:35 up 2 days,  9:04,  4 users,  load average: 10.15,
> 11.05,
> > > > > 11.79
> > > > > Tasks: 195 total,   1 running, 194 sleeping,   0 stopped,   0
> zombie
> > > > > Cpu(s): 28.7%us, 10.1%sy,  0.0%ni, 25.8%id, 32.9%wa,  0.1%hi,
>  2.4%si,
> > > > >  0.0%st
> > > > > Mem:   8166572k total,  8118592k used,    47980k free,     3264k
> > > buffers
> > > > > Swap:  1052248k total,      140k used,  1052108k free,  2099896k
> cached
> > > > >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > > > > 15636 hadoop    18   0 4806m 4.2g  10m S 206.9 53.3  87:48.81 java
> > > > > 15243 hadoop    18   0 4734m 1.3g 9800 S 117.6 16.7  63:46.52 java
> > > > >
> > > > > -James
> > > > >
> > > > > >
> > > > > > St>Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Here are the new region server stats along with load averages:
> > > > > > >
> > > > > > > Region Server 1:
> > > > > > > request=0.0, regions=16, stores=16, storefiles=33,
> > > > > storefileIndexSize=4, memstoreSize=1, compactionQueueSize=0,
> > > usedHeap=2891,
> > > > > maxHeap=4079, blockCacheSize=1403878072, blockCacheFree=307135816,
> > > > > blockCacheCount=21107, blockCacheHitRatio=84, fsReadLatency=0,
> > > > > fsWriteLatency=0, fsSyncLatency=0
> > > > > > > Load Averages: 10.34, 10.58, 7.08
> > > > > > >
> > > > > > > Region Server 2:
> > > > > > > request=0.0, regions=15, stores=16, storefiles=26,
> > > > > storefileIndexSize=3, memstoreSize=1, compactionQueueSize=0,
> > > usedHeap=3257,
> > > > > maxHeap=4079, blockCacheSize=661765368, blockCacheFree=193741576,
> > > > > blockCacheCount=9942, blockCacheHitRatio=77, fsReadLatency=0,
> > > > > fsWriteLatency=0, fsSyncLatency=0
> > > > > > > Load Averages: 1.90, 1.23, 0.98
> > > > > > >
> > > > > > > Region Server 3:
> > > > > > > request=0.0, regions=16, stores=16, storefiles=41,
> > > > > storefileIndexSize=4, memstoreSize=4, compactionQueueSize=0,
> > > usedHeap=1627,
> > > > > maxHeap=4079, blockCacheSize=665117184, blockCacheFree=190389760,
> > > > > blockCacheCount=9995, blockCacheHitRatio=70, fsReadLatency=0,
> > > > > fsWriteLatency=0, fsSyncLatency=0
> > > > > > > Load Averages: 2.01, 3.56, 4.18
> > > > > > >
> > > > > > > That first region server is getting hit much harder than
the
> > > others.
> > > > > > > They're identical machines (8-core), and the distribution
of
> keys
> > > > > should
> > > > > > > be fairly random, so I'm not sure why that would happen.
 Any
> other
> > > > > > > ideas or suggestions would be greatly appreciated.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > James
> > > > > > >
> > > > > > >
> > > > > > > On Mon, 2010-02-15 at 21:51 -0600, Stack wrote:
> > > > > > >> Yeah, I was going to say that if your loading is mostly
read,
> you
> > > can
> > > > > > >> probably go up from the 0.2 given over to cache.  I
like Dan's
> > > > > > >> suggestion of trying it first on one server, if you
can.
> > > > > > >>
> > > > > > >> St.Ack
> > > > > > >>
> > > > > > >> On Mon, Feb 15, 2010 at 5:22 PM, Dan Washusen <
> dan@reactive.org>
> > > > > wrote:
> > > > > > >> > So roughly 72% of reads use the blocks held in
the block
> > > cache...
> > > > > > >> >
> > > > > > >> > It would be interesting to see the difference
between when
> it
> > > was
> > > > > working OK
> > > > > > >> > and now.  Could you try increasing the memory
allocated to
> one
> > > of
> > > > > the
> > > > > > >> > regions and also increasing the "hfile.block.cache.size"
to
> say
> > > > > '0.4' on the
> > > > > > >> > same region?
> > > > > > >> >
> > > > > > >> > On 16 February 2010 11:54, James Baldassari <
> james@dataxu.com>
> > > > > wrote:
> > > > > > >> >
> > > > > > >> >> Hi Dan.  Thanks for your suggestions.  I am
doing writes at
> the
> > > > > same
> > > > > > >> >> time as reads, but there are usually many
more reads than
> > > writes.
> > > > >  Here
> > > > > > >> >> are the stats for all three region servers:
> > > > > > >> >>
> > > > > > >> >> Region Server 1:
> > > > > > >> >> request=0.0, regions=15, stores=16, storefiles=34,
> > > > > storefileIndexSize=3,
> > > > > > >> >> memstoreSize=308, compactionQueueSize=0, usedHeap=3096,
> > > > > maxHeap=4079,
> > > > > > >> >> blockCacheSize=705474544, blockCacheFree=150032400,
> > > > > blockCacheCount=10606,
> > > > > > >> >> blockCacheHitRatio=76, fsReadLatency=0, fsWriteLatency=0,
> > > > > fsSyncLatency=0
> > > > > > >> >>
> > > > > > >> >> Region Server 2:
> > > > > > >> >> request=0.0, regions=16, stores=16, storefiles=39,
> > > > > storefileIndexSize=4,
> > > > > > >> >> memstoreSize=225, compactionQueueSize=0, usedHeap=3380,
> > > > > maxHeap=4079,
> > > > > > >> >> blockCacheSize=643172800, blockCacheFree=212334144,
> > > > > blockCacheCount=9660,
> > > > > > >> >> blockCacheHitRatio=69, fsReadLatency=0, fsWriteLatency=0,
> > > > > fsSyncLatency=0
> > > > > > >> >>
> > > > > > >> >> Region Server 3:
> > > > > > >> >> request=0.0, regions=13, stores=13, storefiles=31,
> > > > > storefileIndexSize=4,
> > > > > > >> >> memstoreSize=177, compactionQueueSize=0, usedHeap=1905,
> > > > > maxHeap=4079,
> > > > > > >> >> blockCacheSize=682848608, blockCacheFree=172658336,
> > > > > blockCacheCount=10262,
> > > > > > >> >> blockCacheHitRatio=72, fsReadLatency=0, fsWriteLatency=0,
> > > > > fsSyncLatency=0
> > > > > > >> >>
> > > > > > >> >> The average blockCacheHitRatio is about 72.
 Is this too
> low?
> > > > >  Anything
> > > > > > >> >> else I can check?
> > > > > > >> >>
> > > > > > >> >> -James
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >> On Mon, 2010-02-15 at 18:16 -0600, Dan Washusen
wrote:
> > > > > > >> >> > Maybe the block cache is thrashing?
> > > > > > >> >> >
> > > > > > >> >> > If you are regularly writing data to
your tables then
> it's
> > > > > possible that
> > > > > > >> >> the
> > > > > > >> >> > block cache is no longer being effective.
 On the region
> > > server
> > > > > web UI
> > > > > > >> >> check
> > > > > > >> >> > the blockCacheHitRatio value.  You want
this value to be
> high
> > > (0
> > > > > - 100).
> > > > > > >> >>  If
> > > > > > >> >> > this value is low it means that HBase
has to go to disk
> to
> > > fetch
> > > > > blocks
> > > > > > >> >> of
> > > > > > >> >> > data.  You can control the amount of
VM memory that HBase
> > > > > allocates to
> > > > > > >> >> the
> > > > > > >> >> > block cache using the "hfile.block.cache.size"
property
> > > (default
> > > > > is 0.2
> > > > > > >> >> > (20%)).
> > > > > > >> >> >
> > > > > > >> >> > Cheers,
> > > > > > >> >> > Dan
> > > > > > >> >> >
> > > > > > >> >> > On 16 February 2010 10:45, James Baldassari
<
> > > james@dataxu.com>
> > > > > wrote:
> > > > > > >> >> >
> > > > > > >> >> > > Hi,
> > > > > > >> >> > >
> > > > > > >> >> > > Does anyone have any tips to share
regarding
> optimization
> > > for
> > > > > random
> > > > > > >> >> > > read performance?  For writes I've
found that setting a
> > > large
> > > > > write
> > > > > > >> >> > > buffer and setting auto-flush to
false on the client
> side
> > > > > significantly
> > > > > > >> >> > > improved put performance.  Are there
any similar easy
> > > tweaks to
> > > > > improve
> > > > > > >> >> > > random read performance?
> > > > > > >> >> > >
> > > > > > >> >> > > I'm using HBase 0.20.3 in a very
read-heavy real-time
> > > system
> > > > > with 1
> > > > > > >> >> > > master and 3 region servers.  It
was working ok for a
> > > while,
> > > > > but today
> > > > > > >> >> > > there was a severe degradation in
read performance.
> > >  Restarting
> > > > > Hadoop
> > > > > > >> >> > > and HBase didn't help, are there
are no errors in the
> logs.
> > > > >  Read
> > > > > > >> >> > > performance starts off around 1,000-2,000
gets/second
> but
> > > > > quickly
> > > > > > >> >> > > (within minutes) drops to around
100 gets/second.
> > > > > > >> >> > >
> > > > > > >> >> > > I've already looked at the performance
tuning wiki
> page.
> > >  On
> > > > > the server
> > > > > > >> >> > > side I've increased hbase.regionserver.handler.count
> from
> > > 10 to
> > > > > 100,
> > > > > > >> >> but
> > > > > > >> >> > > it didn't help.  Maybe this is expected
because I'm
> only
> > > using
> > > > > a single
> > > > > > >> >> > > client to do reads.  I'm working
on implementing a
> client
> > > pool
> > > > > now, but
> > > > > > >> >> > > I'm wondering if there are any other
settings on the
> server
> > > or
> > > > > client
> > > > > > >> >> > > side that might improve things.
> > > > > > >> >> > >
> > > > > > >> >> > > Thanks,
> > > > > > >> >> > > James
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >> > >
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message