hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Hermans <shawnherm...@gmail.com>
Subject Re: Average RPC Queue Time
Date Wed, 20 Nov 2013 19:15:49 GMT
Thanks for all the help.  Follow-up question.  Is it normal to see the
average RPC call queue length stay at over 100 for times of peak usage?


On Wed, Nov 20, 2013 at 12:09 PM, Bryan Beaudreault <
bbeaudreault@hubspot.com> wrote:

> I'm not sure why it is so much higher than your rpc timeout.  Enabling
> DEBUG log level on org.apache.hadoop.ipc.HBaseServer.trace and
> org.apache.hadoop.ipc.HBaseServer loggers might provide you with some
> insight.
>
>
> On Wed, Nov 20, 2013 at 12:55 PM, Shawn Hermans <shawnhermans@gmail.com
> >wrote:
>
> > Shouldn't be.  Looks like Cloudera just converts it to nicer values.  So
> > the actual peak value is 14438088.62 ms for Average RPC queue time.
> >
> >
> > On Wed, Nov 20, 2013 at 11:51 AM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com> wrote:
> >
> > > I'm not sure about the cloudera manager ui, but the metric posted to
> JMX
> > is
> > > in milliseconds.  Are we sure that is not accounting for the confusion?
> > >
> > >
> > > On Wed, Nov 20, 2013 at 12:46 PM, Shawn Hermans <
> shawnhermans@gmail.com
> > > >wrote:
> > >
> > > > Our hbase.rpc.timeout is set for 60 seconds.  Confused as to why I
> > would
> > > > see such large values for the average rpc queue time.  Are there any
> > > other
> > > > metrics? The RPC call queue length is consistently between 150 and
> 200
> > > > during peak usage time.  Is this normal?
> > > >
> > > > Regards,
> > > > Shawn
> > > >
> > > >
> > > > On Wed, Nov 20, 2013 at 11:24 AM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > But that will depend on the timeout that they have configured,
> right?
> > > > >
> > > > > I have seen some third party applications recommending to increase
> > > > timeouts
> > > > > to 1h30...
> > > > >
> > > > > JMS
> > > > > Le 2013-11-20 12:08, "Vladimir Rodionov" <vrodionov@carrieriq.com>
> a
> > > > > écrit :
> > > > >
> > > > > > >>The RpcQueueTime metrics are a measurement of how long
> individual
> > > > calls
> > > > > > >>stay in this queued state.  If your handlers were never
100%
> > > > occupied,
> > > > > > this
> > > > > > >>value would be 0.  An average of 3 hours is concerning,
it
> > > basically
> > > > > > means
> > > > > > >>that when a call comes into the RegionServer it takes
on
> average
> > 3
> > > > > hours
> > > > > > to
> > > > > > >>start processing, because handlers are all occupied
for that
> > amount
> > > > of
> > > > > > time.
> > > > > >
> > > > > > Definitely, this metric is meaningless because default RPC
> timeout
> > is
> > > > 60
> > > > > > sec and under no circumstances
> > > > > > call data can survive this 60 sec in a callQueue unless we have
>  a
> > > bug.
> > > > > >
> > > > > > Best regards,
> > > > > > Vladimir Rodionov
> > > > > > Principal Platform Engineer
> > > > > > Carrier IQ, www.carrieriq.com
> > > > > > e-mail: vrodionov@carrieriq.com
> > > > > >
> > > > > > ________________________________________
> > > > > > From: Bryan Beaudreault [bbeaudreault@hubspot.com]
> > > > > > Sent: Wednesday, November 20, 2013 8:55 AM
> > > > > > To: user@hbase.apache.org
> > > > > > Subject: Re: Average RPC Queue Time
> > > > > >
> > > > > > A regionserver is configured with a certain number of RPC
> handlers
> > > > > > (hbase.regionserver.handler.count).  When these handlers are
all
> > > > > occupied,
> > > > > > the calls back up into a callQueue.  This call queue is bounded
> by
> > > > > > ipc.server.max.callqueue.size (defaulting to 1GB of serialized
> > > > requests)
> > > > > > and ipc.server.max.callqueue.length (10 * numHandlers).  So,
> with 5
> > > > > > handlers a maximum of 50 calls will be queued up before requests
> > are
> > > > > > rejected outright.
> > > > > >
> > > > > > The RpcQueueTime metrics are a measurement of how long individual
> > > calls
> > > > > > stay in this queued state.  If your handlers were never 100%
> > > occupied,
> > > > > this
> > > > > > value would be 0.  An average of 3 hours is concerning, it
> > basically
> > > > > means
> > > > > > that when a call comes into the RegionServer it takes on average
> 3
> > > > hours
> > > > > to
> > > > > > start processing, because handlers are all occupied for that
> amount
> > > of
> > > > > > time.
> > > > > >
> > > > > > You can lower time through a few options:
> > > > > >
> > > > > > - Up the max number of handlers (beware using too many, as this
> > just
> > > > > shifts
> > > > > > load to the disks, and also takes more memory)
> > > > > > - Make your requests smaller (use caching or batching on a scan
> to
> > > > return
> > > > > > less data per RPC call)
> > > > > > - Lower your client-side timeouts, so that you can handle the
> issue
> > > on
> > > > > the
> > > > > > client side (i.e. retries)
> > > > > > - Investigate disk or network issues that could be causing
> > extremely
> > > > slow
> > > > > > response times (ensure data is 100% local, too)
> > > > > >
> > > > > > Just for perspective, the nominal operating value of this
> probably
> > > > varies
> > > > > > greatly with the workload/environment, but in our clusters we
> have
> > an
> > > > > > Average RPC Queue Time of near 0.  We only see the callQueue
fill
> > up
> > > in
> > > > > the
> > > > > > case of real problems, and almost always respond with immediate
> > > > > > redistribution of data to other servers.
> > > > > >
> > > > > > HTH
> > > > > >
> > > > > >  - Bryan
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 20, 2013 at 11:31 AM, Shawn Hermans <
> > > > shawnhermans@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I am using CDH 4.3.1 with HBase 0.94.6.  Using Cloudera
> manager,
> > I
> > > > > > notice a
> > > > > > > metric called Average RPC Queue Time is abnormal.  It is
over 3
> > > hours
> > > > > > > normally and drops to a few minutes during non-peak times.
>  What
> > is
> > > > the
> > > > > > > meaning of this metric? Are these high queue times normal?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Shawn
> > > > > > >
> > > > > >
> > > > > > Confidentiality Notice:  The information contained in this
> message,
> > > > > > including any attachments hereto, may be confidential and is
> > intended
> > > > to
> > > > > be
> > > > > > read only by the individual or entity to whom this message is
> > > > addressed.
> > > > > If
> > > > > > the reader of this message is not the intended recipient or
an
> > agent
> > > or
> > > > > > designee of the intended recipient, please note that any review,
> > use,
> > > > > > disclosure or distribution of this message or its attachments,
in
> > any
> > > > > form,
> > > > > > is strictly prohibited.  If you have received this message in
> > error,
> > > > > please
> > > > > > immediately notify the sender and/or
> Notifications@carrieriq.comand
> > > > > > delete or destroy any copy of this message and its attachments.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message