hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: High CPU Utilization by meta region
Date Wed, 30 Nov 2016 05:51:04 GMT
On Mon, Nov 28, 2016 at 10:25 AM, Timothy Brown <tim@siftscience.com> wrote:

> Responses inlined.
>
> ...

> > >
> > > What is the difference when you compare servers? More requests? More
> i/o?
> > Thread dump the metadata server and let us see a link in here? (What you
> > attached below is cut-off... just as it is getting to the good part).
> >
> >
> > There are more requests to the server containing meta. The network in
> bytes are greater for the meta regionserver than the others but the network
> out bytes are less.
>
> Here's a dropbox link to the output https://dl.dropboxusercontent.com/u/
> 54494127/thread_dump.txt. I apologize for the cliffhanger.
>
>
The in bytes are < the out bytes on the hbase:meta server? Or compared to
other servers? Queries are usually smaller than response and in hbase:meta
case, I'd think that we'd be mostly querying/reading with out much bigger
than in.

Anything else running on this machine besides Master?

If you turn on RPC-level TRACE logging for a minute or so, anything about
the client addresses that seems interesting?

Looking at the thread dump (thanks), you have 1k handlers running?

Thread 1037 (B.defaultRpcServer.handler=999,queue=99,port=60020):

They are all idle in this thread dump (Same for the readers).

I've found that having handlers == # of cpus seems to do the best when
mostly a random read workload.... If lots of writes, good to have a few
extras in case one gets occupied but 1k is a little OTT. Any particular
reason for this many handlers? Would suggest trying way less. Might help w/
CPU. 1k is a lot.

GCG1? (See HBASE-17072 CPU usage starts to climb up to 90-100% when using
G1GC; purge ThreadLocal usage)


>
> >
> > > Here's some more info about our cluster:
> > > HBase version 1.2
> > >
> >
> > Which 1.2?
> >
> > 1.2.0 which is bundled with CDH 5.8.0
>
> >
> >
> > > Number of regions: 72
> > > Number of tables: 97
> > >
> >
> > On whole cluster? (Can't have more tables than regions...)
> >
> >
> > An error on my part, I meant to put 72 region servers.
>
>
> >
> > > Approx. requests per second to meta region server: 3k
> > >
>

That is not much. If all cached should be able to do way more than that.



> >
> > Can you see who is hitting he meta region most? (Enable rpc-level TRACE
> > logging on the server hosting meta for a minute or so and see where the
> > requests are coming in from).
> >
> > What is your cache hit rate? Can you get it higher?
> >
> > Cache hit rate is above 99%. We see very little disk reads.
>
>
> > Is there much writing going on against meta? Or is cluster stable regards
> > region movement/creation?
> >
> > Writing is very infrequent. The cluster is stable with regards to region
> movement and creation.
>
> >
> >
> > > Approx. requests per second to entire HBase cluster: 90k
> > >
> > > Additional info:
> > >
> > >
> > > From Storefile Metrics:
> > > Stores Num: 1
> > > Storefiles: 1
> > > Storefile Size: 30m
> > > Uncompressed Storefile Size: 30m
>

Super small.

St.Ack




> > > Index Size: 459k
> > >
> > >
> > This from meta table? That is very small.
> >
> > Yes this is from the meta table.
>
>
> >
> > >
> > > I/O for the region server with only meta on it:
> > > 48M bytes in
> > >
> >
> >
> > Whats all the writing about?
> >
> > I'm not sure. According to the AWS dashboard there are no disk writes at
> that time.
>
> >
> >
> > > 5.9B bytes out
> > >
> > >
> > This is disk or network? If network, is that 5.9 bytes?
> >
> > This is network and thats 5.9 billion byes. (I'm using the AWS dashboard
> for this)
>
>
> > Thanks Tim,
> > S
> >
> >
> >
> > > I used the debug dump on the region server's UI but it was too large
> > > for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE
> > >
> > >
> > > Thanks for the help,
> > >
> > > Tim
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message