hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@apache.org>
Subject Re: Re-swizzle 2.3
Date Fri, 07 Feb 2014 04:56:46 GMT

Thanks. please post your findings, Jian wrote this part of the code and between him/me, we
can take care of those issues.

+1 for going ahead with the revert on branch-2.3. I'll go do that tomorrow morning unless
I hear otherwise from Jian.

Thanks,
+Vinod


On Feb 6, 2014, at 8:28 PM, Alejandro Abdelnur <tucu@cloudera.com> wrote:

> Hi Vinod,
> 
> Nothing confidential,
> 
> * With umanaged AMs I'm seeing the trace I've posted a couple of days ago
> in YARN-1577 (
> https://issues.apache.org/jira/browse/YARN-1577?focusedCommentId=13891853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13891853
> ).
> 
> * Also, Robert has been digging in Oozie testcases failing/getting suck
> with several token renewer threads, this failures happened consistently at
> different places around the same testcases (like some file descriptors
> leaking out), reverting YARN-1490 fixes the problem. The potential issue
> with this is that a long running client (oozie) my run into this situation
> thus becoming unstable.
> 
> *Robert,* mind posting to YARN-1490 the jvm thread dump at the time of test
> hanging?
> 
> After YARN-1493 & YARN-1490 we have a couple of JIRAs trying to fix issues
> introduced by them, and we still didn't get them right.
> 
> Because this, the improvements driven by YARN-1493 & YARN-1490 seem that
> require more work before being stable.
> 
> IMO, being conservative, we should do 2.3 without them and roll them with
> 2.4. If we want to do regular releases we will have to make this kind of
> calls, else we will start dragging the releases.
> 
> Sounds like a plan?
> 
> Thanks.
> 
> 
> 
> On Thu, Feb 6, 2014 at 6:27 PM, Vinod Kumar Vavilapalli
> <vinodkv@apache.org>wrote:
> 
>> Hey
>> 
>> I am not against removing them from 2.3 if that is helpful for progress.
>> But I want to understand what the issues are before we make that decision.
>> 
>> There is the issue with unmanaged AM that is clearly known and I was
>> thinking of coming to the past two days, but couldn't. What is this new
>> issue that we (confidently?) pinned down to YARN-1490?
>> 
>> Thanks
>> +Vinod
>> 
>> On Feb 6, 2014, at 5:07 PM, Alejandro Abdelnur <tucu@cloudera.com> wrote:
>> 
>>> Thanks Robert,
>>> 
>>> All,
>>> 
>>> So it seems that YARN-1493 and YARN-1490 are introducing serious
>>> regressions.
>>> 
>>> I would propose to revert them and the follow up JIRAs from the 2.3
>> branch
>>> and keep working on them on trunk/branch-2 until the are stable (I would
>>> even prefer reverting them from branch-2 not to block a 2.4 if they are
>> not
>>> ready in time).
>>> 
>>> As I've mentioned before, the list of JIRAs to revert were:
>>> 
>>> YARN-1493
>>> YARN-1490
>>> YARN-1166
>>> YARN-1041
>>> YARN-1566
>>> 
>>> Plus 2 additional JIRAs committed since my email on this issue 2 days
>> ago:
>>> 
>>> *YARN-1661
>>> *YARN-1689 (not sure if this JIRA is related in functionality to the
>>> previous ones but it is creating conflicts).
>>> 
>>> I think we should hold on continuing work on top of something that is
>>> broken until the broken stuff is fixed.
>>> 
>>> Quoting Arun, "Committers - Henceforth, please use extreme caution while
>>> committing to branch-2.3. Please commit *only* blockers to 2.3."
>>> 
>>> YARN-1661 & YARN-1689 are not blockers.
>>> 
>>> Unless there are objections, I'll revert all these JIRAs from branch-2.3
>>> tomorrow around noon and I'll update fixedVersion in the JIRAs.
>>> 
>>> I'm inclined to revert them from branch-2 as well.
>>> 
>>> Thoughts?
>>> 
>>> Thanks.
>>> 
>>> 
>>> On Thu, Feb 6, 2014 at 3:54 PM, Robert Kanter <rkanter@cloudera.com>
>> wrote:
>>> 
>>>> I think we should revert YARN-1490 from Hadoop 2.3 branch.  I think it
>> was
>>>> causing some strange behavior in the Oozie unit tests:
>>>> 
>>>> Basically, we use a single MiniMRCluster and MiniDFSCluster across all
>> unit
>>>> tests in a module.  With YARN-1490 we saw that, regardless of test
>> order,
>>>> the last few tests would timeout waiting for an MR job to finish; on
>> slower
>>>> machines, the entire test suite would timeout.  Through some digging, I
>>>> found that we were getting a ton of "Connection refused" Exceptions on
>>>> LeaseRenewer talking to the NN and a few on the AM talking to the RM.
>>>> 
>>>> After a bunch of investigation, I found that the problem went away once
>>>> YARN-1490 was removed.  Though I couldn't figure out the exact problem.
>>>> Even though this occurred in unit tests, it does make me concerned that
>> it
>>>> could indicate some bigger issue in a long-running real cluster (where
>>>> everything isn't running on the same machine) that we haven't seen yet.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Feb 6, 2014 at 3:06 PM, Karthik Kambatla <kasha@cloudera.com>
>>>> wrote:
>>>> 
>>>>> I have marked MAPREDUCE-5744 a blocker for 2.3. Committing it shortly.
>>>> Will
>>>>> pull it out of branch-2.3 if anyone objects.
>>>>> 
>>>>> 
>>>>> On Thu, Feb 6, 2014 at 2:04 PM, Arpit Agarwal <
>> aagarwal@hortonworks.com
>>>>>> wrote:
>>>>> 
>>>>>> Merged HADOOP-10273 to branch-2.3 as r1565456.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Feb 5, 2014 at 4:49 PM, Arpit Agarwal <
>>>> aagarwal@hortonworks.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> IMO HADOOP-10273 (Fix 'mvn site') should be included in 2.3.
>>>>>>> 
>>>>>>> I will merge it to branch-2.3 tomorrow PST if no one disagrees.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Feb 4, 2014 at 5:03 PM, Alejandro Abdelnur <
>>>> tucu@cloudera.com
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> IMO YARN-1577 is a blocker, it is breaking unmanaged AMs
in a very
>>>> odd
>>>>>>>> ways
>>>>>>>> (to the point it seems un-deterministic).
>>>>>>>> 
>>>>>>>> I'd say eiher YARN-1577 is fixed or we revert
>>>>>>>> YARN-1493/YARN-1490/YARN-1166/YARN-1041/YARN-1566 (almost
clean
>>>>> reverts)
>>>>>>>> from Hadoop 2.3 branch before doing the release.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I've verified that after reverting those JIRAs things work
fine with
>>>>>>>> unmanaged AMs.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Feb 4, 2014 at 11:45 AM, Arun C Murthy <acm@hortonworks.com
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I punted YARN-1444 to 2.4 since it's a long-standing
issue.
>>>>>>>>> 
>>>>>>>>> Jian is away and I don't see YARN-1577 & YARN-1206
making much
>>>>>> progress
>>>>>>>>> till he is back; so I'm inclined to push both to 2.4
too. Any
>>>>>>>> objections?
>>>>>>>>> 
>>>>>>>>> Looks like Daryn has both HADOOP-10301 & HDFS-4564
covered.
>>>>>>>>> 
>>>>>>>>> Overall, I'll try get this out in next couple of days
if we can
>>>>> clear
>>>>>>>> the
>>>>>>>>> list.
>>>>>>>>> 
>>>>>>>>> thanks,
>>>>>>>>> Arun
>>>>>>>>> 
>>>>>>>>> On Feb 3, 2014, at 12:14 PM, Arun C Murthy <acm@hortonworks.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> An update. Per https://s.apache.org/hadoop-2.3.0-blockers
we
>>>> are
>>>>>> now
>>>>>>>>> down to 5 blockers: 1 Common, 1 HDFS, 3 YARN.
>>>>>>>>>> 
>>>>>>>>>> Daryn (thanks!) has both the non-YARN covered. Vinod
is helping
>>>>> out
>>>>>>>> with
>>>>>>>>> the YARN ones.
>>>>>>>>>> 
>>>>>>>>>> thanks,
>>>>>>>>>> Arun
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Arun C. Murthy
>>>>>>>>> Hortonworks Inc.
>>>>>>>>> http://hortonworks.com/
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>> NOTICE: This message is intended for the use of the individual
or
>>>>>>>> entity to
>>>>>>>>> which it is addressed and may contain information that
is
>>>>>> confidential,
>>>>>>>>> privileged and exempt from disclosure under applicable
law. If the
>>>>>>>> reader
>>>>>>>>> of this message is not the intended recipient, you are
hereby
>>>>> notified
>>>>>>>> that
>>>>>>>>> any printing, copying, dissemination, distribution, disclosure
or
>>>>>>>>> forwarding of this communication is strictly prohibited.
If you
>>>> have
>>>>>>>>> received this communication in error, please contact
the sender
>>>>>>>> immediately
>>>>>>>>> and delete it from your system. Thank You.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Alejandro
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> CONFIDENTIALITY NOTICE
>>>>>> NOTICE: This message is intended for the use of the individual or
>>>> entity
>>>>> to
>>>>>> which it is addressed and may contain information that is
>> confidential,
>>>>>> privileged and exempt from disclosure under applicable law. If the
>>>> reader
>>>>>> of this message is not the intended recipient, you are hereby notified
>>>>> that
>>>>>> any printing, copying, dissemination, distribution, disclosure or
>>>>>> forwarding of this communication is strictly prohibited. If you have
>>>>>> received this communication in error, please contact the sender
>>>>> immediately
>>>>>> and delete it from your system. Thank You.
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Alejandro
>> 
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 
> 
> 
> 
> -- 
> Alejandro


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message