Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of svarma.ng@gmail.com
 designates 209.85.214.41 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAPcDmSujM1BtG6Os76u0qO6q9iHe8Hq4WFQqJiS_OZ95OG54Fg@mail.gmail.com>
References: 
 <CAN89x=xha+EBOPaZopcctPrBAiMfAZ2LoFofoUva5jN=X6bGRw@mail.gmail.com>
	<CAN89x=zZgT-ABCAr1Gab4o4ymQLMg7h+k-zgujsS1kzF-Pk1Cg@mail.gmail.com>
	<CAN89x=zDS6bjPLuM+bDLZWnRaqgCVm16+g4SzjDcua7MduO3QA@mail.gmail.com>
	<CAPcDmSsqs2-S=59reg-e_Hof5J_PMgCD0Cb3D62Va6z=Y+1DAA@mail.gmail.com>
	<CAN89x=yRu418o8YWEEt3jPbFt7jTm47Hwft3dw5VZS5Vz89SUA@mail.gmail.com>
	<CAPcDmSujM1BtG6Os76u0qO6q9iHe8Hq4WFQqJiS_OZ95OG54Fg@mail.gmail.com>
Date: Tue, 10 Jul 2012 10:22:58 -0700
Message-ID: 
 <CAN89x=ykjTcAiq_7USySpTR0vyDEjP=7EL6KU1-1qcWzF=rbGA@mail.gmail.com>
Subject: Re: HBaseClient recovery from .META. server power down
From: Suraj Varma <svarma.ng@gmail.com>
To: user@hbase.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Yes.

On the maxRetries, though ... I saw the code
(http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#677)
show
this.maxRetries = conf.getInt("hbase.ipc.client.connect.max.retries", 0);

So - looks like by default, the maxRetries is set to 0? So ... there
is effectively no retry (i.e. it is fail-fast)
--Suraj

On Tue, Jul 10, 2012 at 10:12 AM, N Keywal <nkeywal@gmail.com> wrote:
> Thanks for the jira.
> The client can be connected to multiple RS, depending on the rows is
> working on. So yes it's initial, but it's a dynamic initial :-).
> This said there is a retry on error...
>
> On Tue, Jul 10, 2012 at 6:46 PM, Suraj Varma <svarma.ng@gmail.com> wrote:
>> I will create a JIRA ticket ...
>>
>> The only side-effect I could think of is ... if a RS is having a GC of
>> a few seconds, any _new_ client trying to connect would get connect
>> failures. So ... the _initial_ connection to the RS is what would
>> suffer from a super-low setting of the ipc.socket.timeout. This was my
>> read of the code.
>>
>> So - was hoping to get a confirmation if this is the only side effect.
>> Again - this is on the client side - I wouldn't risk doing this on the
>> cluster side ...
>> --Suraj
>>
>> On Mon, Jul 9, 2012 at 9:44 AM, N Keywal <nkeywal@gmail.com> wrote:
>>> Hi,
>>>
>>> What you're describing -the 35 minutes recovery time- seems to match
>>> the code. And it's a bug (still there on trunk). Could you please
>>> create a jira for it? If you have the logs it even better.
>>>
>>> Lowering the ipc.socket.timeout seems to be an acceptable partial
>>> workaround. Setting it to 10s seems ok to me. Lower than this... I
>>> don't know.
>>>
>>> N.
>>>
>>>
>>> On Mon, Jul 9, 2012 at 6:16 PM, Suraj Varma <svarma.ng@gmail.com> wrote:
>>>> Hello:
>>>> I'd like to get advice on the below strategy of decreasing the
>>>> "ipc.socket.timeout" configuration on the HBase Client side ... has
>>>> anyone tried this? Has anyone had any issues with configuring this
>>>> lower than the default 20s?
>>>>
>>>> Thanks,
>>>> --Suraj
>>>>
>>>> On Mon, Jul 2, 2012 at 5:51 PM, Suraj Varma <svarma.ng@gmail.com> wrote:
>>>>> By "power down" below, I mean powering down the host with the RS that
>>>>> holds the .META. table. (So - essentially, the host IP is unreachable
>>>>> and the RS/DN is gone.)
>>>>>
>>>>> Just wanted to clarify my below steps ...
>>>>> --S
>>>>>
>>>>> On Mon, Jul 2, 2012 at 5:36 PM, Suraj Varma <svarma.ng@gmail.com> wrote:
>>>>>> Hello:
>>>>>> We've been doing some failure scenario tests by powering down a .META.
>>>>>> holding region server host and while the HBase cluster itself recovers
>>>>>> and reassigns the META region and other regions (after we tweaked down
>>>>>> the default timeouts), our client apps using HBaseClient take a long
>>>>>> time to recover.
>>>>>>
>>>>>> hbase-0.90.6 / cdh3u4 / JDK 1.6.0_23
>>>>>>
>>>>>> Process:
>>>>>> 1) Apply load via client app on HBase cluster for several minutes
>>>>>> 2) Power down the region server holding the .META. server
>>>>>> 3) Measure how long it takes for cluster to reassign META table and
>>>>>> for client threads to re-lookup and re-orient to the lesser cluster
>>>>>> (minus the RS and DN on that host).
>>>>>>
>>>>>> What we see:
>>>>>> 1) Client threads spike up to maxThread size ... and take over 35 mins
>>>>>> to recover (i.e. for the thread count to go back to normal) - no calls
>>>>>> are being serviced - they are all just backed up on a synchronized
>>>>>> method ...
>>>>>>
>>>>>> 2) Essentially, all the client app threads queue up behind the
>>>>>> HBaseClient.setupIOStreams method in oahh.ipc.HBaseClient
>>>>>> (http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.90.2/org/apache/hadoop/hbase/ipc/HBaseClient.java#312).
>>>>>> http://tinyurl.com/7js53dj
>>>>>>
>>>>>> After taking several thread dumps we found that the thread within this
>>>>>> synchronized method was blocked on
>>>>>>    NetUtils.connect(this.socket, remoteId.getAddress(), getSocketTimeout(conf));
>>>>>>
>>>>>> Essentially, the thread which got the lock would try to connect to the
>>>>>> dead RS (till socket times out), retrying, and then the next thread
>>>>>> gets in and so forth.
>>>>>>
>>>>>> Solution tested:
>>>>>> -------------------
>>>>>> So - the ipc.HBaseClient code shows ipc.socket.timeout default is 20s.
>>>>>> We dropped this down to a low number (1000 ms,  100 ms, etc) and the
>>>>>> recovery was much faster (in a couple of minutes).
>>>>>>
>>>>>> So - we're thinking of setting the HBase client side hbase-site.xml
>>>>>> with an ipc.socket.timeout of 100ms. Looking at the code, it appears
>>>>>> that this is only ever used during the initial "HConnection" setup via
>>>>>> the NetUtils.connect and should only ever be used when connectivity to
>>>>>> a region server is lost and needs to be re-established. i.e it does
>>>>>> not affect the normal "RPC" actiivity as this is just the connect
>>>>>> timeout.
>>>>>>
>>>>>> Am I reading the code right? Any thoughts on how whether this is too
>>>>>> low for comfort? (Our internal tests did not show any errors during
>>>>>> normal operation related to timeouts etc ... but, I just wanted to run
>>>>>> this by the experts.).
>>>>>>
>>>>>> Note that this above timeout tweak is only on the HBase client side.
>>>>>> Thanks,
>>>>>> --Suraj