Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
Sender: balaji@balajin.net
In-Reply-To: 
 <CADY20s6ESW6WQA9j1mwr3hnNTx2F8yW+OYa4czUZXS=L=ZXOPw@mail.gmail.com>
References: 
 <CABbGW3wG0390=bmhSi1P=8J-ZSr373fEQk=YNe6idjFGiAEdqQ@mail.gmail.com>
	<CADY20s6XkN_Cub6+QVOjxUMyf2WHkCLT18etp9vjh1HRq7ZjfA@mail.gmail.com>
	<CA+4kjVvJihNu0oQ42gv0XArFek7gQsoUQRz9ej+8ttiK7r9CSw@mail.gmail.com>
	<CABbGW3y6X0YAQw3pgPt+BN7boxdsSmdoKt0rGBx=-0ydRdjgnw@mail.gmail.com>
	<CADY20s6ESW6WQA9j1mwr3hnNTx2F8yW+OYa4czUZXS=L=ZXOPw@mail.gmail.com>
Date: Sat, 27 Oct 2012 10:34:58 -0700
Message-ID: 
 <CACvhJWcfxVjDWyxWk8RuDuHt+Sm1yy1NTQO_8j0wS+S0B+SMYQ@mail.gmail.com>
Subject: Re: HDFS HA IO Fencing
From: 
 =?UTF-8?B?QmFsYWppIE5hcmF5YW5hbiAo4K6q4K6+4K6y4K6+4K6c4K6/IOCuqOCuvuCusOCuvuCur+Cuow==?=
	=?UTF-8?B?4K6p4K+NKQ==?= <lists@balajin.net>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=047d7b339025f6792804cd0ddac2

--047d7b339025f6792804cd0ddac2
Content-Type: text/plain; charset=UTF-8

If you use NSFv4 you should be able to use locks and when a machine dies /
fails to renew the lease, the other machine can take over.

On Friday, October 26, 2012, Todd Lipcon wrote:

> NFS Locks typically last forever if you disconnect abruptly. So they are
> not sufficient -- your standby wouldn't be able to take over without manual
> intervention to remove the lock.
>
> If you want to build an unreliable system that might corrupt your data,
> you could set up 'shell(/bin/true)' as a second fencing method. But, it's
> really a bad idea. There are failure scenarios which could cause split
> brain if you do this, and you'd very likely lose data.
>
> -Todd
>
> On Fri, Oct 26, 2012 at 1:59 AM, lei liu <liulei412@gmail.com<javascript:_e({}, 'cvml', 'liulei412@gmail.com');>
> > wrote:
>
>> We are using NFS for Shared storage,  Can we use linux nfslcok service to
>> implement IO Fencing ?
>>
>>
>> 2012/10/26 Steve Loughran <stevel@hortonworks.com <javascript:_e({},
>> 'cvml', 'stevel@hortonworks.com');>>
>>
>>>
>>>
>>> On 25 October 2012 14:08, Todd Lipcon <todd@cloudera.com<javascript:_e({}, 'cvml', 'todd@cloudera.com');>
>>> > wrote:
>>>
>>>> Hi Liu,
>>>>
>>>> Locks are not sufficient, because there is no way to enforce a lock in
>>>> a distributed system without unbounded blocking. What you might be
>>>> referring to is a lease, but leases are still problematic unless you can
>>>> put bounds on the speed with which clocks progress on different machines,
>>>> _and_ have strict guarantees on the way each node's scheduler works. With
>>>> Linux and Java, the latter is tough.
>>>>
>>>>
>>> on any OS running in any virtual environment, including EC2, time is
>>> entirely unpredictable, just to make things worse.
>>>
>>>
>>> On a single machine you can use file locking as the OS will know that
>>> the process is dead and closes the file; other programs can attempt to open
>>> the same file with exclusive locking -and, by getting the right failures,
>>> know that something else has the file, hence the other process is live.
>>> Shared NFS storage you need to mount with softlock set precisely to stop
>>> file locks lasting until some lease has expired, because the on-host
>>> liveness probes detect failure faster and want to react to it.
>>>
>>>
>>> -Steve
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


-- 
Thanks
-balaji

--
http://balajin.net/blog/
http://flic.kr/balajijegan

--047d7b339025f6792804cd0ddac2
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

If you use NSFv4 you should be able to use locks and when a machine dies / =
fails to renew the lease, the other machine can take over.<div><br>On Frida=
y, October 26, 2012, Todd Lipcon  wrote:<br><blockquote class=3D"gmail_quot=
e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
NFS Locks typically last forever if you disconnect abruptly. So they are no=
t sufficient -- your standby wouldn&#39;t be able to take over without manu=
al intervention to remove the lock.<div><br></div><div>If you want to build=
 an unreliable system that might corrupt your data, you could set up &#39;s=
hell(/bin/true)&#39; as a second fencing method. But, it&#39;s really a bad=
 idea. There are failure scenarios which could cause split brain if you do =
this, and you&#39;d very likely lose data.<br>


<div><div><br></div><div>-Todd<br><br><div class=3D"gmail_quote">On Fri, Oc=
t 26, 2012 at 1:59 AM, lei liu <span dir=3D"ltr">&lt;<a href=3D"javascript:=
_e({}, &#39;cvml&#39;, &#39;liulei412@gmail.com&#39;);" target=3D"_blank">l=
iulei412@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">We are using NFS for  Shared storage,=C2=A0 =
Can we use linux nfslcok service to implement  IO Fencing ?<div>
<div>
<br>
<br><div class=3D"gmail_quote">2012/10/26 Steve Loughran <span dir=3D"ltr">=
&lt;<a href=3D"javascript:_e({}, &#39;cvml&#39;, &#39;stevel@hortonworks.co=
m&#39;);" target=3D"_blank">stevel@hortonworks.com</a>&gt;</span><br><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex">


<br><br><div class=3D"gmail_quote"><div>On 25 October 2012 14:08, Todd Lipc=
on <span dir=3D"ltr">&lt;<a href=3D"javascript:_e({}, &#39;cvml&#39;, &#39;=
todd@cloudera.com&#39;);" target=3D"_blank">todd@cloudera.com</a>&gt;</span=
> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


Hi Liu,<div><br></div><div>Locks are not sufficient, because there is no wa=
y to enforce a lock in a distributed system without unbounded blocking. Wha=
t you might be referring to is a lease, but leases are still problematic un=
less you can put bounds on the speed with which clocks progress on differen=
t machines, _and_ have strict guarantees on the way each node&#39;s schedul=
er works. With Linux and Java, the latter is tough.</div>


<div><br></div></blockquote><div><br></div></div><div>on any OS running in =
any virtual environment, including EC2, time is entirely unpredictable, jus=
t to make things worse.=C2=A0</div><div><br></div><div><br></div><div>On a =
single machine you can use file locking as the OS will know that the proces=
s is dead and closes the file; other programs can attempt to open the same =
file with exclusive locking -and, by getting the right failures, know that =
something else has the file, hence the other process is live. Shared NFS st=
orage you need to mount with softlock set precisely to stop file locks last=
ing until some lease has expired, because the on-host liveness probes detec=
t failure faster and want to react to it.</div>


<span><font color=3D"#888888">
<div><br></div><div><br></div><div>-Steve</div></font></span></div>
</blockquote></div><br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Todd Lipcon<br>Software Engineer, Cloudera<br>
</div></div></div>
</blockquote></div><span></span><br><br>-- <br>Thanks<br>-balaji<p>--<br><a=
 href=3D"http://balajin.net/blog/">http://balajin.net/blog/</a><br><a href=
=3D"http://flic.kr/balajijegan">http://flic.kr/balajijegan</a><br></p>

--047d7b339025f6792804cd0ddac2--