hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth P. Shreenivas" <Srikanth_Shreeni...@mindtree.com>
Subject RE: Query regarding HTable.get and timeouts
Date Thu, 18 Aug 2011 18:51:49 GMT
Please note that line numbers I am referencing are from the file : https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java



________________________________________
From: Srikanth P. Shreenivas
Sent: Friday, August 19, 2011 12:19 AM
To: user@hbase.apache.org
Subject: RE: Query regarding HTable.get and timeouts

Hi Stack,

Thanks a lot for your reply.  It's always a comforting feeling to see very active community
and especially your prompt replies to the queries.

Yes, I am running it in as GridGain task,  so it runs it GridGain's thread pool.   In this
case, we can imaging GridGain as something that hands off works to various worker threads
and waits asynhronously  for it complete.  I  have 10 minute timeout after which GridGain
would consider work as timed out.

What we are observing is that our tasks are timeing out at 10 minute boundary, and delay seems
to be caused by the part of the work which is doing HTable.get.

My suspicion is that Line 1255 in HConnectionManager.java is calling the Thread.currentThread().interrupt(),
due to which the GridGain thread kind of stops doing what it was meant to do, and never responsds
to master node resulting in timeout in master.

In order for line 1255 to execute, we will have to assume that all retries were exhausted.
Hence, my query that what would cause a HTable.get() to get into a situation wherein HConnectionManager$HConnectionImplementation.getRegionServerWithRetries
gets to line 1255.


Regards,
Srikanth

________________________________________
From: saint.ack@gmail.com [saint.ack@gmail.com] on behalf of Stack [stack@duboce.net]
Sent: Friday, August 19, 2011 12:03 AM
To: user@hbase.apache.org
Subject: Re: Query regarding HTable.get and timeouts

Is your client running inside a container of some form and could the
container be doing the interrupting?   I've not come across
client-side thread interrupts before.
St.Ack

On Thu, Aug 18, 2011 at 7:37 AM, Srikanth P. Shreenivas
<Srikanth_Shreenivas@mindtree.com> wrote:
> Hi,
>
> We are experiencing an issue in our HBase Cluster wherein some of the gets are timing
outs at:
>
> java.io.IOException: Giving up trying to get region server: thread is interrupted.
>                at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
>                at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>
>
> When we look at the logs of master, zookeeper and region servers, there is nothing that
indicates anything abnormal.
>
> I tried looking up below functions, but at this point could not make much out of it.
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
 - getRegionServerWithRetries  starts at Line 1233
> https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
 Htable.get starts at Line 611.
>
>
> If you can please suggest what are the scenarios in which all retries can get exhausted
resulting in thread interruption.
>
> We have seen this issue in two of our HBase Clusters, where load is quite less.  We have
20 reads per minute,  we run 1 zookeeper, and 4 regionservers in fully-distributed mode (Hadoop).
 We are using CDH3.
>
> Thanks,
> Srikanth
>
> ________________________________
>
> http://www.mindtree.com/email/disclaimer.html
>
Mime
View raw message