hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Federico Gaule <fga...@despegar.com>
Subject Re: RPC - Queue Time when handlers are all waiting
Date Tue, 10 Dec 2013 12:05:39 GMT
I've increased hbase.regionserver.replication.handler.count 10x (30) but
nothing have changed. rpc.metrics.RpcQueueTime_avg_time still shows
activity :(

Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 29 on 60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 28 on
60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 27 on
60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 26 on
60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)......
...
...
Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 2 on 60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 1 on 60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)Mon Dec 09 14:04:10 EST 2013REPL IPC Server handler 0 on 60000WAITING
(since 16hrs, 58mins, 56sec ago)Waiting for a call (since 16hrs, 58mins,
56sec ago)
Thanks JM


2013/12/9 Jean-Marc Spaggiari <jean-marc@spaggiari.org>

> Yes, default value is 3 in 0.94.14. If you have not changed it, then it's
> still 3.
>
> conf.getInt("hbase.regionserver.replication.handler.count", 3);
>
> Keep us posted on the results.
>
> JM
>
>
> 2013/12/9 Federico Gaule <fgaule@despegar.com>
>
> > Default value for hbase.regionserver.replication.handler.count (can't
> find
> > what is the default, Is it 3?)
> > I'll do a try increasing that property
> >
> > Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler 2 on 60020WAITING
> > (since 8sec ago)Waiting for a call (since 8sec ago)Fri Dec 06 12:44:12
> EST
> > 2013REPL IPC Server handler 1 on 60020WAITING (since 8sec ago)Waiting
> for a
> > call (since 8sec ago)Fri Dec 06 12:44:12 EST 2013REPL IPC Server handler
> 0
> > on 60020WAITING (since 2sec ago)Waiting for a call (since 2sec ago)
> > Thanks JM
> >
> >
> > 2013/12/9 Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> >
> > > For replications, the handlers used on the salve cluster are configured
> > by
> > > hbase.regionserver.replication.handler.count. What value do you have
> for
> > > this property?
> > >
> > > JM
> > >
> > >
> > > 2013/12/9 Federico Gaule <fgaule@despegar.com>
> > >
> > > > Here is a thread saying what i think it should be (
> > > > http://grokbase.com/t/hbase/user/13bmndq53k/average-rpc-queue-time)
> > > >
> > > > "The RpcQueueTime metrics are a measurement of how long individual
> > calls
> > > > stay in this queued state. If your handlers were never 100% occupied,
> > > this
> > > > value would be 0. An average of 3 hours is concerning, it basically
> > means
> > > > that when a call comes into the RegionServer it takes on average 3
> > hours
> > > to
> > > > start processing, because handlers are all occupied for that amount
> of
> > > > time."
> > > >
> > > > Is that correct?
> > > >
> > > >
> > > >
> > > > 2013/12/9 Federico Gaule <fgaule@despegar.com>
> > > >
> > > > > Correct me if i'm wrong, but, Queues should be used only when
> > handlers
> > > > are
> > > > > all busy, shouldn't it?.
> > > > > If that's true, i don't get why there is activity related to
> queues.
> > > > >
> > > > > Maybe i'm missing some piece of knowledge about when hbase is using
> > > > queues
> > > > > :)
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > 2013/12/9 Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> > > > >
> > > > >> There might be something I'm missing ;)
> > > > >>
> > > > >> On cluster B, as you said, never more than 50% of your handlers
> are
> > > > used.
> > > > >> Your Ganglia metrics are showing that there is activities (num
ops
> > is
> > > > >> increasing), which is correct.
> > > > >>
> > > > >> Can you please confirm what you think is wrong from your charts?
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> JM
> > > > >>
> > > > >>
> > > > >> 2013/12/9 Federico Gaule <fgaule@despegar.com>
> > > > >>
> > > > >> > Hi JM,
> > > > >> > Cluster B is only receiving replication data (writes), but
> > handlers
> > > > are
> > > > >> > waiting most of the time (never 50% of them are used). As
i have
> > > read,
> > > > >> RPC
> > > > >> > queue is only used when handlers are all waiting, does it
count
> > for
> > > > >> > replication as well?
> > > > >> >
> > > > >> > Thanks!
> > > > >> >
> > > > >> >
> > > > >> > 2013/12/9 Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> > > > >> >
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > When you say that B doesn't get any read/write operation,
does
> > it
> > > > mean
> > > > >> > you
> > > > >> > > stopped the replication? Or B is still getting the
write
> > > operations
> > > > >> from
> > > > >> > A
> > > > >> > > because of the replication? If so, that's why you RPC
queue is
> > > > used...
> > > > >> > >
> > > > >> > > JM
> > > > >> > >
> > > > >> > >
> > > > >> > > 2013/12/9 Federico Gaule <fgaule@despegar.com>
> > > > >> > >
> > > > >> > > > Not much information in RS logs (DEBUG level set
to
> > > > >> > > > org.apache.hadoop.hbase). Here is a sample of
one
> regionserver
> > > > >> showing
> > > > >> > > > increasing rpc.metrics.RpcQueueTime_num_ops and
> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > >> > > > activity:
> > > > >> > > >
> > > > >> > > > 2013-12-09 08:09:10,699 DEBUG
> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Stats:
> > > total=23.14
> > > > >> MB,
> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > > > >> > hits=122168501,
> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > cachingHits=122162378,
> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > > >> > > > evictedPerRun=Infinity
> > > > >> > > > 2013-12-09 08:09:11,396 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 1
> > > > >> > > > 2013-12-09 08:09:14,979 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 2
> > > > >> > > > 2013-12-09 08:09:16,016 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 1
> > > > >> > > > ...
> > > > >> > > > 2013-12-09 08:14:07,659 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 1
> > > > >> > > > 2013-12-09 08:14:08,713 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 3
> > > > >> > > > 2013-12-09 08:14:10,699 DEBUG
> > > > >> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Stats:
> > > total=23.14
> > > > >> MB,
> > > > >> > > > free=2.73 GB, max=2.75 GB, blocks=0, accesses=122442151,
> > > > >> > hits=122168501,
> > > > >> > > > hitRatio=99.77%, , cachingAccesses=122192927,
> > > > cachingHits=122162378,
> > > > >> > > > cachingHitsRatio=99.97%, , evictions=0, evicted=6768,
> > > > >> > > > evictedPerRun=Infinity
> > > > >> > > > 2013-12-09 08:14:12,711 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 1
> > > > >> > > > 2013-12-09 08:14:14,778 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 3
> > > > >> > > > ...
> > > > >> > > > 2013-12-09 08:15:09,199 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 3
> > > > >> > > > 2013-12-09 08:15:12,243 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 2
> > > > >> > > > 2013-12-09 08:15:22,086 INFO
> > > > >> > > >
> > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSink:
> > > > >> Total
> > > > >> > > > replicated: 2
> > > > >> > > >
> > > > >> > > > Thanks
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > 2013/12/7 Bharath Vissapragada <bharathv@cloudera.com>
> > > > >> > > >
> > > > >> > > > > I'd look into the RS logs to see whats happening
there.
> > > > Difficult
> > > > >> to
> > > > >> > > > guess
> > > > >> > > > > from the given information!
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Sat, Dec 7, 2013 at 8:52 PM, Federico
Gaule <
> > > > >> fgaule@despegar.com>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Any clue?
> > > > >> > > > > > El dic 5, 2013 9:49 a.m., "Federico
Gaule" <
> > > > fgaule@despegar.com
> > > > >> >
> > > > >> > > > > escribió:
> > > > >> > > > > >
> > > > >> > > > > > > Hi,
> > > > >> > > > > > >
> > > > >> > > > > > > I have 2 clusters, Master (a) -
Slave (b) replication.
> > > > >> > > > > > > B doesn't have client write or
reads, all handlers
> (100)
> > > are
> > > > >> > > waiting
> > > > >> > > > > but
> > > > >> > > > > > > rpc.metrics.RpcQueueTime_num_ops
and
> > > > >> > > > rpc.metrics.RpcQueueTime_avg_time
> > > > >> > > > > > reports
> > > > >> > > > > > > to be rpc calls to be queued.
> > > > >> > > > > > > There are some screenshots below
to show ganglia
> > metrics.
> > > > How
> > > > >> is
> > > > >> > > this
> > > > >> > > > > > > behaviour explained? I have looked
for metrics
> > > > specifications
> > > > >> but
> > > > >> > > > can't
> > > > >> > > > > > > find much information.
> > > > >> > > > > > >
> > > > >> > > > > > > Handlers
> > > > >> > > > > > > http://i42.tinypic.com/242ssoz.png
> > > > >> > > > > > >
> > > > >> > > > > > > NumOps
> > > > >> > > > > > > http://tinypic.com/r/of2c8k/5
> > > > >> > > > > > >
> > > > >> > > > > > > AvgTime
> > > > >> > > > > > > http://tinypic.com/r/2lsvg5w/5
> > > > >> > > > > > >
> > > > >> > > > > > > Cheers
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Bharath Vissapragada
> > > > >> > > > > <http://www.cloudera.com>
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > [image:
> > > > >> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > > >> > ]
> > > > >> > > >
> > > > >> > > > *Ing. Federico Gaule*
> > > > >> > > > Líder Técnico - PAM <hotels-pam-it@despegar.com>
> > > > >> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > > > tel. +54 (11) 4894-3500
> > > > >> > > >
> > > > >> > > > *[image: Seguinos en Twitter!] <
> > > http://twitter.com/#!/despegarar>
> > > > >> > > [image:
> > > > >> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> > > [image:
> > > > >> > > Seguinos
> > > > >> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > > > *Despegar.com, Inc. *
> > > > >> > > > El mejor precio para tu viaje.
> > > > >> > > >
> > > > >> > > > Este mensaje es confidencial y puede contener
información
> > > > amparada
> > > > >> por
> > > > >> > > el
> > > > >> > > > secreto profesional.
> > > > >> > > > Si usted ha recibido este e-mail por error, por
favor
> > > > >> comuníquenoslo
> > > > >> > > > inmediatamente respondiendo a este e-mail y luego
> > eliminándolo
> > > de
> > > > >> su
> > > > >> > > > sistema.
> > > > >> > > > El contenido de este mensaje no deberá ser
copiado ni
> > > divulgado a
> > > > >> > > ninguna
> > > > >> > > > persona.
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > [image:
> > > > http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> > > > >> >
> > > > >> > *Ing. Federico Gaule*
> > > > >> > Líder Técnico - PAM <hotels-pam-it@despegar.com>
> > > > >> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > >> > tel. +54 (11) 4894-3500
> > > > >> >
> > > > >> > *[image: Seguinos en Twitter!] <
> http://twitter.com/#!/despegarar>
> > > > >> [image:
> > > > >> > Seguinos en Facebook!] <http://www.facebook.com/despegar>
> [image:
> > > > >> Seguinos
> > > > >> > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > >> > *Despegar.com, Inc. *
> > > > >> > El mejor precio para tu viaje.
> > > > >> >
> > > > >> > Este mensaje es confidencial y puede contener información
> > amparada
> > > > por
> > > > >> el
> > > > >> > secreto profesional.
> > > > >> > Si usted ha recibido este e-mail por error, por favor
> > > comuníquenoslo
> > > > >> > inmediatamente respondiendo a este e-mail y luego eliminándolo
> de
> > > su
> > > > >> > sistema.
> > > > >> > El contenido de este mensaje no deberá ser copiado ni
> divulgado a
> > > > >> ninguna
> > > > >> > persona.
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > [image:
> > http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > > ]
> > > > >
> > > > > *Ing. Federico Gaule*
> > > > > Líder Técnico - PAM <hotels-pam-it@despegar.com>
> > > > >
> > > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > > tel. +54 (11) 4894-3500
> > > > >
> > > > >
> > > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > > [image:
> > > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > > Seguinos
> > > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > > *Despegar.com, Inc. *
> > > > >
> > > > > El mejor precio para tu viaje.
> > > > >
> > > > > Este mensaje es confidencial y puede contener información amparada
> > por
> > > > el
> > > > > secreto profesional.
> > > > > Si usted ha recibido este e-mail por error, por favor
> comuníquenoslo
> > > > > inmediatamente respondiendo a este e-mail y luego eliminándolo
de
> su
> > > > > sistema.
> > > > > El contenido de este mensaje no deberá ser copiado ni divulgado
a
> > > > ninguna
> > > > > persona.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > [image:
> http://www.despegar.com/galeria/images/promos/isodespegar1.png
> > ]
> > > >
> > > > *Ing. Federico Gaule*
> > > > Líder Técnico - PAM <hotels-pam-it@despegar.com>
> > > > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > > > tel. +54 (11) 4894-3500
> > > >
> > > > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> > > [image:
> > > > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> > > Seguinos
> > > > en YouTube!] <http://www.youtube.com/Despegar>*
> > > > *Despegar.com, Inc. *
> > > > El mejor precio para tu viaje.
> > > >
> > > > Este mensaje es confidencial y puede contener información amparada
> por
> > > el
> > > > secreto profesional.
> > > > Si usted ha recibido este e-mail por error, por favor comuníquenoslo
> > > > inmediatamente respondiendo a este e-mail y luego eliminándolo de su
> > > > sistema.
> > > > El contenido de este mensaje no deberá ser copiado ni divulgado a
> > > ninguna
> > > > persona.
> > > >
> > >
> >
> >
> >
> > --
> >
> > [image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]
> >
> > *Ing. Federico Gaule*
> > Líder Técnico - PAM <hotels-pam-it@despegar.com>
> > Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
> > tel. +54 (11) 4894-3500
> >
> > *[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar>
> [image:
> > Seguinos en Facebook!] <http://www.facebook.com/despegar> [image:
> Seguinos
> > en YouTube!] <http://www.youtube.com/Despegar>*
> > *Despegar.com, Inc. *
> > El mejor precio para tu viaje.
> >
> > Este mensaje es confidencial y puede contener información amparada por
> el
> > secreto profesional.
> > Si usted ha recibido este e-mail por error, por favor comuníquenoslo
> > inmediatamente respondiendo a este e-mail y luego eliminándolo de su
> > sistema.
> > El contenido de este mensaje no deberá ser copiado ni divulgado a
> ninguna
> > persona.
> >
>



-- 

[image: http://www.despegar.com/galeria/images/promos/isodespegar1.png]

*Ing. Federico Gaule*
Líder Técnico - PAM <hotels-pam-it@despegar.com>
Av. Corrientes 746 - Piso 9 - C.A.B.A. (C1043AAU)
tel. +54 (11) 4894-3500

*[image: Seguinos en Twitter!] <http://twitter.com/#!/despegarar> [image:
Seguinos en Facebook!] <http://www.facebook.com/despegar> [image: Seguinos
en YouTube!] <http://www.youtube.com/Despegar>*
*Despegar.com, Inc. *
El mejor precio para tu viaje.

Este mensaje es confidencial y puede contener información amparada por el
secreto profesional.
Si usted ha recibido este e-mail por error, por favor comuníquenoslo
inmediatamente respondiendo a este e-mail y luego eliminándolo de su
sistema.
El contenido de este mensaje no deberá ser copiado ni divulgado a ninguna
persona.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message