hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Beaudreault <bbeaudrea...@hubspot.com>
Subject Re: Average RPC Queue Time
Date Wed, 20 Nov 2013 17:51:14 GMT
I'm not sure about the cloudera manager ui, but the metric posted to JMX is
in milliseconds.  Are we sure that is not accounting for the confusion?


On Wed, Nov 20, 2013 at 12:46 PM, Shawn Hermans <shawnhermans@gmail.com>wrote:

> Our hbase.rpc.timeout is set for 60 seconds.  Confused as to why I would
> see such large values for the average rpc queue time.  Are there any other
> metrics? The RPC call queue length is consistently between 150 and 200
> during peak usage time.  Is this normal?
>
> Regards,
> Shawn
>
>
> On Wed, Nov 20, 2013 at 11:24 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > But that will depend on the timeout that they have configured, right?
> >
> > I have seen some third party applications recommending to increase
> timeouts
> > to 1h30...
> >
> > JMS
> > Le 2013-11-20 12:08, "Vladimir Rodionov" <vrodionov@carrieriq.com> a
> > écrit :
> >
> > > >>The RpcQueueTime metrics are a measurement of how long individual
> calls
> > > >>stay in this queued state.  If your handlers were never 100%
> occupied,
> > > this
> > > >>value would be 0.  An average of 3 hours is concerning, it basically
> > > means
> > > >>that when a call comes into the RegionServer it takes on average 3
> > hours
> > > to
> > > >>start processing, because handlers are all occupied for that amount
> of
> > > time.
> > >
> > > Definitely, this metric is meaningless because default RPC timeout is
> 60
> > > sec and under no circumstances
> > > call data can survive this 60 sec in a callQueue unless we have  a bug.
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > ________________________________________
> > > From: Bryan Beaudreault [bbeaudreault@hubspot.com]
> > > Sent: Wednesday, November 20, 2013 8:55 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Average RPC Queue Time
> > >
> > > A regionserver is configured with a certain number of RPC handlers
> > > (hbase.regionserver.handler.count).  When these handlers are all
> > occupied,
> > > the calls back up into a callQueue.  This call queue is bounded by
> > > ipc.server.max.callqueue.size (defaulting to 1GB of serialized
> requests)
> > > and ipc.server.max.callqueue.length (10 * numHandlers).  So, with 5
> > > handlers a maximum of 50 calls will be queued up before requests are
> > > rejected outright.
> > >
> > > The RpcQueueTime metrics are a measurement of how long individual calls
> > > stay in this queued state.  If your handlers were never 100% occupied,
> > this
> > > value would be 0.  An average of 3 hours is concerning, it basically
> > means
> > > that when a call comes into the RegionServer it takes on average 3
> hours
> > to
> > > start processing, because handlers are all occupied for that amount of
> > > time.
> > >
> > > You can lower time through a few options:
> > >
> > > - Up the max number of handlers (beware using too many, as this just
> > shifts
> > > load to the disks, and also takes more memory)
> > > - Make your requests smaller (use caching or batching on a scan to
> return
> > > less data per RPC call)
> > > - Lower your client-side timeouts, so that you can handle the issue on
> > the
> > > client side (i.e. retries)
> > > - Investigate disk or network issues that could be causing extremely
> slow
> > > response times (ensure data is 100% local, too)
> > >
> > > Just for perspective, the nominal operating value of this probably
> varies
> > > greatly with the workload/environment, but in our clusters we have an
> > > Average RPC Queue Time of near 0.  We only see the callQueue fill up in
> > the
> > > case of real problems, and almost always respond with immediate
> > > redistribution of data to other servers.
> > >
> > > HTH
> > >
> > >  - Bryan
> > >
> > >
> > > On Wed, Nov 20, 2013 at 11:31 AM, Shawn Hermans <
> shawnhermans@gmail.com
> > > >wrote:
> > >
> > > > I am using CDH 4.3.1 with HBase 0.94.6.  Using Cloudera manager, I
> > > notice a
> > > > metric called Average RPC Queue Time is abnormal.  It is over 3 hours
> > > > normally and drops to a few minutes during non-peak times.  What is
> the
> > > > meaning of this metric? Are these high queue times normal?
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message