Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of nuggetwheat@gmail.com
 designates 64.233.170.184 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type;
        b=U26upLx+Wp0zj2NxV6hN5q+jtuqWKhxQBzoyWHIu70gL3t+e922U0O3i5FzQvk0FV/
         CCWmGUlRYFuXJzRLkG+g/3Z32apNCcAGQsTFphYtMBgs9q01Y1MWo1yuNWKes/UrZ70U
         3U9qVak8b9CayWTBGkd/uR69NxWcjD0PmogbM=
MIME-Version: 1.0
Sender: nuggetwheat@gmail.com
In-Reply-To: <4987509A.8050304@yahoo-inc.com>
References: <483801AF-6E06-473F-A12F-CA7B35DFFCCF@yahoo-inc.com>
	 <4982D83A.9050607@apache.org>
	 <7c962aed0901300936m549f6f99s25f6c5b973899e13@mail.gmail.com>
	 <4983A15F.4090402@yahoo-inc.com> <4983A689.5050201@yahoo-inc.com>
	 <4987509A.8050304@yahoo-inc.com>
Date: Mon, 2 Feb 2009 12:51:42 -0800
Message-ID: <e1fe19820902021251r5a95e15bnab8e0d44ed91717@mail.gmail.com>
Subject: Re: Hadoop 0.19.1
From: Doug Judd <doug@zvents.com>
To: core-dev@hadoop.apache.org
Content-Type: multipart/alternative; boundary=000e0cd6aefacf4f510461f5b865

--000e0cd6aefacf4f510461f5b865
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hi Konstantin,

We are also heavy users of fsync().  I've been working with Dhruba on
Hadoop-4379 <https://issues.apache.org/jira/browse/HADOOP-4379>.  His most
recent patch appears to work for Jim's situation.  However, there are still
a couple of problems that need to be resolved before we can start heavily
using it:

1. After application crash/restart, the file length (as returned by
getFileStatus) is incorrect since the length at the namenode is stale.
Ideally getFileStatus() would return the accurate file length by fetching
the size of the last block from the primary datanode.  If that's not
feasible, there should be some other way to obtain the actual file length.

2. When an application comes up after a crash, it seems to hang for about 60
seconds waiting for lease recovery.  Our database cannot go offline for a
whole minute doing nothing.  In our case, when we come up after a crash and
try to re-open the log file, we know for certain that we are the exclusive
owner of that file.  There should be a way to tell the system to forcibly
take over the lease and recover immediately

What do you recommend?  Is there anyway we could get these two issues fixed
for 0.19.1, or should I file issues for them and get them on the schedule
for 0.19.2?

- Doug

On Mon, Feb 2, 2009 at 11:59 AM, Konstantin Shvachko <shv@yahoo-inc.com>wrote:

> Raghu, thanks for providing the link.
>
> Jim> Are proposing disabling both append and sync?
>
> Jim, this statement is probably too strong.
> sync is not disabled per se, you will be able to use it, although
> its full semantic is not guaranteed in some failure scenarios.
> See more here
> https://issues.apache.org/jira/browse/HADOOP-4663#action_12661802
>
> We will have to really disable append (throw UnsupportedOperationException)
> because otherwise current solution may lead to a loss of previously existed
> data.
>
> I agree with Nigel that there is a need for an urgent 0.19.1 release
> because a lot of bugs were fixed since 0.18.2 and 0.19.0.
> The system is now stable on our clusters with 0.18.3 same fixes went into
> 0.19.1.
>
> If we try to rush fixing the bugs for append (listed in my comment) we
> risk to destabilize the system again, and this is my main concern.
>
> Formally we should not release until a feature is fixed, but I think it
> is better to let people use a stable release with limited functionality
> rather than having full functionality with a risk of data loss.
>
> Hope this will work for everybody.
> --Konstantin
>
>
> Raghu Angadi wrote:
>
>> Raghu Angadi wrote:
>>
>>> Is it there also that we would find what is involved making append
>>>> work in 0.19.1?
>>>>
>>>
>>> If one knew what is enough to fix properly, it would be easy. But over
>>> last couple of months, there have been many fixes (some of these jiras are
>>> listed in one Konstantins HADOOP-4663).
>>>
>>
>> Konstantin's comment I referred to (it was also linked from HADOOP-4663,
>> but harder to find).
>>
>> https://issues.apache.org/jira/browse/HADOOP-5027#action_12668136
>>
>> Raghu.
>>
>>  The discussions are still bringing up more cases where the implementation
>>> or algorithm should change. But these are improvements for sure. But doubt
>>> if I would be ready to call it is 'completely fixed'. It needs time and a
>>> lot of testing in large clusters.
>>>
>>> Personally I am +1 for getting these into 0.19 branch. Most importantly
>>> even clusters and application not using append or sync were also affected,
>>> thats why extra caution.
>>>
>>> my 2 cents. hope this does not digress too much from the main topic.
>>>
>>> Raghu.
>>>
>>>  Thanks,
>>>> St.Ack
>>>>
>>>>
>>>>
>>>> On Fri, Jan 30, 2009 at 2:36 AM, Steve Loughran <stevel@apache.org>
>>>> wrote:
>>>>
>>>>  Nigel Daley wrote:
>>>>>
>>>>>  Folks,
>>>>>>
>>>>>> Some Hadoop deployments have upgraded to 0.19.0.  Clearly, the 0.19
>>>>>> branch
>>>>>> has issues and a 0.19.1 release is needed.
>>>>>>
>>>>>> Quality issues in the changes made for the file append feature have
>>>>>> prevented some from deploying Hadoop 0.19.  One of these changes
>>>>>> (sync) has
>>>>>> now been "fixed" by reducing its semantics in Hadoop 0.18.3
>>>>>> (HADOOP-4997).
>>>>>>  This was necessary to stabilize the 0.18 branch.
>>>>>>
>>>>>> I would like to propose that we apply this same "fix" to sync in
>>>>>> 0.19.1
>>>>>> and 0.20.0.  Since append requires the full semantics of sync, I
>>>>>> propose we
>>>>>> also disable append (perhaps throw UnsupportedOperationException from
>>>>>> API?).
>>>>>>  Yes, this would unfortunately be an incompatible change between
>>>>>> 0.19.0 and
>>>>>> 0.19.1.  We can then take the time needed to fix append properly in
>>>>>> 0.21.0.
>>>>>>
>>>>>>  I can see some people being unhappy about this, but giving them a
>>>>> choice
>>>>> between having the filesystem work or not, hopefully they will see the
>>>>> merits of the change. And I am +1 to taking time to fix things; fast
>>>>> fixes
>>>>> often create new problems
>>>>>
>>>>>
>>>>
>>>
>>
>>

--000e0cd6aefacf4f510461f5b865--