hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hairong Kuang <hair...@yahoo-inc.com>
Subject Re: Revote: HDFS 0.20.1/HDFS-0.20.2 compatibility
Date Thu, 14 Jan 2010 00:28:47 GMT
Thanks Todd for taking the initiative to get the compatibility issue
resolved. I will review the patch.

+1 Todd's proposal on the condition that HDFS-101, 793, and the
compatibility fix are well tested.

Hairong

On 1/13/10 12:41 PM, "Eli Collins" <eli@cloudera.com> wrote:

> On Wed, Jan 13, 2010 at 12:24 PM, Todd Lipcon <todd@cloudera.com> wrote:
>> Hi all,
>> 
>> Last week we had a vote regarding the compatibility problem introduced in
>> branch-0.20 by the backport of HDFS-793, necessary for HDFS-101, which fixes
>> a large bug in the write pipeline recovery code. The majority of people
>> seemed to indicate that this incompatibility was unacceptable, and thus we
>> should pull it out.
>> 
>> However, I think everyone agrees that the bug itself is pretty critical, and
>> it would be good to have it fixed - Hairong indicated that it's likely going
>> to go to Yahoo's internal customers, and Cloudera would like to include it
>> as well. In our experience we've run into it several times - whenever there
>> are a few "bad apple" nodes in a cluster that haven't failed hard, it causes
>> a lot of write pipeline failures (particularly, any pipeline that picks a
>> bad node as the first node will not recover). For MapReduce it's not a huge
>> deal, since the tasks will rerun elsewhere and usually succeed, but for
>> applications like HBase or continuous logging to HDFS, it's a big problem.
>> 
>> I have taken the time to develop and test a patch for branch-0.20 which goes
>> on top of HDFS-793 and HDFS-101 but maintains compatibility with 0.20.1.
>> I've posted this patch and a summary of my testing to HDFS-872. Although
>> this code is tricky to get right, the hardest parts are with the thread
>> communication and understanding the correct semantics, which I've not
>> touched at all. I think as long as there's a good review of my patch, we
>> should feel comfortable introducing it into branch-0.20.
>> 
>> Thanks
>> -Todd
>> 
> 
> +1
> 
> This is a critical bug, it would have been a blocker for 20 had we
> known about it. Assuming your change that resolves the protocol
> incompatibility is reviewed and tested to people's liking I think we
> should put it in 20.
> 
> Thanks,
> Eli


Mime
View raw message