hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mukund murrali <mukundmurra...@gmail.com>
Subject Re: HConnection thread waiting on blocking queue indefinitely
Date Wed, 10 Jun 2015 05:52:04 GMT
We are using HBase-1.0.0. Just before the client stalled, in RS there were
few handler threads that were blocked for  MVCC(thread stack below) check.
Not sure if it could cause a problem. I don't see anything unusual in RS
threads. Also the same client can connect to regionserver after restart. At
that instant what causing the problem is what we are confused.


java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Object.wait(Native Method)
        at
org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
        - locked <0x00000007ac0e0e88> (a java.util.LinkedList)
        at
org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.completeMemstoreInsertWithSeqNum(MultiVersionConsistencyControl.java:127)
        at
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2822)
        at
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2476)
        at
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2430)
        at
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2434)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:640)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:604)
        at
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1832)
        at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31313)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java.lang.Thread.run(Thread.java:745)




On Tue, Jun 9, 2015 at 6:48 PM, Anoop John <anoop.hbase@gmail.com> wrote:

> Can you see at this time, what the threads at RS doing? Handlers mainly..
> which version oh hbase?
>
> On Tuesday, June 9, 2015, mukund murrali <mukundmurrali9@gmail.com> wrote:
> > Hi
> >
> > I wrote a sample program with default client configurations and created a
> > single connection. I spawn client threads > hbase.hconnection.threads.max
> > from my client application and each thread insert data to hbase cluster.
> > Once a region split happens, all the hconnection threads(core pool and
> max
> > pool size were kept at 256) stalled at BoundedCompletionService.take()
> > indefinitely. Even after the split completed it never resumed.
> >
> > So does it mean I have to create more instances of connection object for
> a
> > cluster in such scenarios (which is really not needed) ? There was no
> > exception (I expected a RejectedExecution) also in client side. So
> changing
> > the  hbase.hconnection.threads.max, hbase.hconnection.threads.core can
> > create such problem?
> >
> >
> >
> > On Sat, Jun 6, 2015 at 5:02 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> >> Not very sure on what could be the problem when the meta update
> happened.
> >> I would think that when the region split happened, there was some issue
> on
> >> the meta update (as you said in the later mail). The splitted regions
> would
> >> not have been updated properly in the META.  So any client updates/reads
> >> happening to this region would have stalled and hence your client
> >> application also stalled.
> >>
> >> As I said the logs would be important here to know what happened.  This
> >> could be one of a case and could be identified with the logs.
> >>
> >> Regards
> >> Ram
> >>
> >> On Sat, Jun 6, 2015 at 1:25 PM, mukund murrali <
> mukundmurrali9@gmail.com>
> >> wrote:
> >>
> >> > Sorry for misleading by specifying it as meta split. It was meta
> update
> >> > during a user region split. This had caused the stallation probably.
> We
> >> > have right now reverting client configs. Till now we didn't face the
> >> issue
> >> > again. Those changes causing some kindof exceptions or timeout was
> what
> >> we
> >> > expected, but clients stalling indefinitely is what worrying us.
> >> >
> >> > On Friday 5 June 2015, Vladimir Rodionov <vladrodionov@gmail.com>
> wrote:
> >> >
> >> > > I would suggest reverting client config changes back to defaults.
At
> >> > least
> >> > > we will know if the issue is somehow related to client config
> changes.
> >> > > On Jun 5, 2015 6:15 AM, "ramkrishna vasudevan" <
> >> > > ramkrishna.s.vasudevan@gmail.com <javascript:;>> wrote:
> >> > >
> >> > > > Hbase:meta getting split? It may b some user region, can u check
> >> that?
> >> > If
> >> > > > ur meta was splitting then there is something wrong.
> >> > > > Can u attach the log snippets.
> >> > > >
> >> > > > Sent from phone. Excuse typos.
> >> > > > On Jun 5, 2015 6:00 PM, "mukund murrali" <
> mukundmurrali9@gmail.com
> >> > > <javascript:;>> wrote:
> >> > > >
> >> > > > > Hi
> >> > > > >
> >> > > > > In our case there at that instance when the client thread
> stalled,
> >> > > there
> >> > > > > was a hbase:meta region split happening. So what went wrong?
If
> >> there
> >> > > is
> >> > > > a
> >> > > > > split why should hconnection thread stall? Since we changed
the
> >> > client
> >> > > > > configuration caused this? I am once again specifying our
client
> >> > > related
> >> > > > > changes we did
> >> > > > >
> >> > > > > hbase.client.retries.number => 5
> >> > > > > zookeeper.recovery.retry => 0
> >> > > > > zookeeper.session.timeout => 1000
> >> > > > > zookeeper.recovery.retry.
> >> > > > > intervalmilli => 1
> >> > > > > hbase.rpc.timeout => 30000.
> >> > > > >
> >> > > > > Is zk timeout too low?
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Fri, Jun 5, 2015 at 11:37 AM, ramkrishna vasudevan <
> >> > > > > ramkrishna.s.vasudevan@gmail.com <javascript:;>>
wrote:
> >> > > > >
> >> > > > > > When you started  your client server was the META table
> assigned.
> >> > > May
> >> > > > be
> >> > > > > > some thing happened around that time and the client
app was
> just
> >> > > > waiting
> >> > > > > on
> >> > > > > > the meta table to be assigned.  It would have retried
- Can
> you
> >> > check
> >> > > > the
> >> > > > > > logs.?
> >> > > > > >
> >> > > > > > So the best part here is the stand alone client was
able to be
> >> > > > > successful -
> >> > > > > > which means the new clients were able to talk successfully
> with
> >> the
> >> > > > > > server.  And hence the restart of your client has solved
 your
> >> > > problem.
> >> > > > > It
> >> > > > > > may be difficult to trouble shoot the exact issue with
the
> >> limited
> >> > > > info -
> >> > > > > > but see if your client app regularly gets stalled and
then it
> is
> >> > > better
> >> > > > > to
> >> > > > > > trouble shoot your app and the way it accesses the
server.
> >> > > > > >
> >> > > > > > On Fri, Jun 5, 2015 at 11:21 AM, PRANEESH KUMAR <
> >> > > > > praneesh.sankar@gmail.com <javascript:;>
> >> > > > > > >
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > The client connection was in stalled state. But
there was
> only
> >> > one
> >> > > > > > > hconnection thread found in our thread dump, which
was
> waiting
> >> > > > > > indefinitely
> >> > > > > > > in BoundedCompletionService.take call. Meanwhile
we ran a
> >> > > standalone
> >> > > > > test
> >> > > > > > > program which was successful.
> >> > > > > > >
> >> > > > > > > Once we restarted the client server, the problem
got
> resolved.
> >> > > > > > >
> >> > > > > > > The basic doubt is, when the hconnection thread
stalled, why
> >> the
> >> > > > HBase
> >> > > > > > > client failed to create any more hconnections(max
pool size
> was
> >> > > 10).
> >> > > > In
> >> > > > > > > case of problem with table/meta regions how come
the test
> >> program
> >> > > > > > > succeeded.
> >> > > > > > >
> >> > > > > > > Regards,
> >> > > > > > > Praneesh
> >> > > > > > >
> >> > > > > > > On Fri, Jun 5, 2015 at 10:21 AM, ramkrishna vasudevan
<
> >> > > > > > > ramkrishna.s.vasudevan@gmail.com <javascript:;>>
wrote:
> >> > > > > > >
> >> > > > > > > > Can you tell us more. Is your client not
working at all
> and
> >> it
> >> > is
> >> > > > > > > stalled ?
> >> > > > > > > > Are you seeing some results but you find
it slow than you
> >> > > expected?
> >> > > > > > > >
> >> > > > > > > > What type of workload are you running?  All
the tables are
> >> > > healthy?
> >> > > > > > Are
> >> > > > > > > > you able to read or write to them individually
using the
> >> hbase
> >> > > > shell?
> >> > > > > > > >
> >> > > > > > > > On Fri, Jun 5, 2015 at 10:18 AM, PRANEESH
KUMAR <
> >> > > > > > > praneesh.sankar@gmail.com <javascript:;>
> >> > > > > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Hi Ram,
> >> > > > > > > > >
> >> > > > > > > > > The cluster ran without any problem
for about 2 to 3
> days
> >> > with
> >> > > > low
> >> > > > > > > load,
> >> > > > > > > > > once we enabled it for high load we
immediately faced
> this
> >> > > issue.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Regards,
> >> > > > > > > > > Praneesh.
> >> > > > > > > > >
> >> > > > > > > > > On Thursday 4 June 2015, ramkrishna
vasudevan <
> >> > > > > > > > > ramkrishna.s.vasudevan@gmail.com <javascript:;>>
wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Is your cluster in working condition.
 Can you see if
> the
> >> > > META
> >> > > > > has
> >> > > > > > > been
> >> > > > > > > > > > assigned properly?  If the META
table is not
> initialized
> >> > and
> >> > > > > opened
> >> > > > > > > > then
> >> > > > > > > > > > your client thread will hang.
> >> > > > > > > > > >
> >> > > > > > > > > > Regards
> >> > > > > > > > > > Ram
> >> > > > > > > > > >
> >> > > > > > > > > > On Thu, Jun 4, 2015 at 9:05 PM,
PRANEESH KUMAR <
> >> > > > > > > > > praneesh.sankar@gmail.com <javascript:;>
> >> > > > > > > > > > <javascript:;>>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hi,
> >> > > > > > > > > > >
> >> > > > > > > > > > > We are using Hbase-1.0.0.
We also facing the same
> issue
> >> > > that
> >> > > > > > client
> >> > > > > > > > > > > connection thread is waiting
at
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200).
> >> > > > > > > > > > >
> >> > > > > > > > > > > Any help is appreciated.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Regards,
> >> > > > > > > > > > > Praneesh
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message