hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nigel Daley <nda...@mac.com>
Subject Re: Patch testing
Date Wed, 26 Jan 2011 07:19:14 GMT
Started another trial run of MR precommit testing:
https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/17/

Let's see if 17th time is a charm...

Nige

On Jan 7, 2011, at 5:14 PM, Todd Lipcon wrote:

> On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley <ndaley@mac.com> wrote:
> 
>> Hrm, the MR precommit test I'm running has hung (been running for 14 hours
>> so far).  FWIW, 2 HDFS precommit tests are hung too.  I suspect it could be
>> the NFS mounts on the machines.  I forced a thread dump which you can see in
>> the console:
>> https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console
>> 
>> 
> Strange, haven't seen a hang like that before in handleConnectionFailure. It
> should retry for 15 minutes max in that loop.
> 
> 
>> Any other ideas why these might be hanging?
>> 
>> 
> There is an HDFS bug right now that can cause hangs on some tests -
> HDFS-1529 - would appreciate if someone can take a look. But I don't think
> this is responsible for the MR hang above.
> 
> -Todd
> 
> 
>> On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote:
>> 
>>> On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley <ndaley@mac.com> wrote:
>>> 
>>>> Thanks for looking into it Todd.  Let's first see if you think it can be
>>>> fixed quickly.  Let me know.
>>>> 
>>>> 
>>> No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which
>> fixes
>>> this test timeout for me.
>>> 
>>> -Todd
>>> 
>>> 
>>>> On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote:
>>>> 
>>>>> On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley <ndaley@mac.com> wrote:
>>>>> 
>>>>>> Todd, would love to get
>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first
>> since
>>>>>> this is failing every night on trunk.
>>>>>> 
>>>>> 
>>>>> What if we disable that test, move that issue to 0.22 blocker, and then
>>>>> enable the test-patch? I'll also look into that one today, but if it's
>>>>> something that will take a while to fix, I don't think we should hold
>> off
>>>>> the useful testing for all the other patches.
>>>>> 
>>>>> -Todd
>>>>> 
>>>>> On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote:
>>>>>> 
>>>>>>> Hi Nigel,
>>>>>>> 
>>>>>>> MAPREDUCE-2172 has been fixed for a while. Are there any other
>>>> particular
>>>>>>> JIRAs you think need to be fixed before the MR test-patch queue
gets
>>>>>>> enabled? I have a lot of outstanding patches and doing all the
>>>> test-patch
>>>>>>> turnaround manually on 3 different boxes is a real headache.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> -Todd
>>>>>>> 
>>>>>>> On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley <ndaley@mac.com>
wrote:
>>>>>>> 
>>>>>>>> Ok, HDFS is now enabled.  You'll see a stream of updates
shortly on
>>>> the
>>>>>> ~30
>>>>>>>> Patch Available HDFS issues.
>>>>>>>> 
>>>>>>>> Nige
>>>>>>>> 
>>>>>>>> On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote:
>>>>>>>> 
>>>>>>>>> I committed HDFS-1511 this morning.  We should be good
to go.  I
>> can
>>>>>>>>> haz snooty robot butler?
>>>>>>>>> 
>>>>>>>>> On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik <
>> cos@apache.org>
>>>>>>>> wrote:
>>>>>>>>>> Thanks Jacob. I am wasted already but I can do it
on Sun, I think,
>>>>>>>>>> unless it is done earlier.
>>>>>>>>>> --
>>>>>>>>>> Take care,
>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Fri, Dec 17, 2010 at 19:41, Jakob Homan <jghoman@gmail.com>
>>>> wrote:
>>>>>>>>>>> Ok.  I'll get a patch out for 1511 tomorrow,
unless someone wants
>>>> to
>>>>>>>>>>> whip one up tonight.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley
<ndaley@mac.com>
>>>> wrote:
>>>>>>>>>>>> I agree with Cos on fixing HDFS-1511 first.
Once that is done
>> I'll
>>>>>>>> enable hdfs patch testing.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Nige
>>>>>>>>>>>> 
>>>>>>>>>>>> Sent from my iPhone4
>>>>>>>>>>>> 
>>>>>>>>>>>> On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik
<cos@apache.org
>>> 
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> One more issue needs to be addressed
before test-patch is
>> turned
>>>> on
>>>>>>>> HDFS is
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-1511
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Take care,
>>>>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 16:17, Konstantin
Boudnik <
>>>> cos@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>>>> Considering that because of these
4 faulty cases every patch
>>>> will
>>>>>> be
>>>>>>>>>>>>>> -1'ed a patch author will still have
to look at it and make a
>>>>>>>> comment
>>>>>>>>>>>>>> why this particular -1 isn't valid.
Lesser work, perhaps, but
>>>>>>>> messier
>>>>>>>>>>>>>> IMO. I'm not blocking it - I just
feel like there's a better
>>>> way.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Take care,
>>>>>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 15:55, Jakob
Homan <jghoman@gmail.com
>>> 
>>>>>>>> wrote:
>>>>>>>>>>>>>>>> If HDFS is added to the test-patch
queue right now we get
>>>>>>>>>>>>>>>> nothing but dozens of -1'ed
patches.
>>>>>>>>>>>>>>> There aren't dozens of patches
being submitted currently.
>> The
>>>> -1
>>>>>>>>>>>>>>> isn't the important thing, it's
the grunt work of actually
>>>>>> running
>>>>>>>>>>>>>>> (and waiting) for the tests,
test-patch, etc. that Hudson
>> does
>>>> so
>>>>>>>> that
>>>>>>>>>>>>>>> the developer doesn't have to.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:48
PM, Dhruba Borthakur <
>>>>>>>> dhruba@gmail.com> wrote:
>>>>>>>>>>>>>>>> +1, thanks for doing this.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:19
PM, Jakob Homan <
>>>> jghoman@gmail.com
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> So, with test-patch updated
to show the failing tests,
>> saving
>>>>>> the
>>>>>>>>>>>>>>>>> developers the need to
go and verify that the failed tests
>>>> are
>>>>>>>> all
>>>>>>>>>>>>>>>>> known, how do people
feel about turning on test-patch again
>>>> for
>>>>>>>> HDFS
>>>>>>>>>>>>>>>>> and mapred?  I think
it'll help prevent any more tests from
>>>>>>>> entering
>>>>>>>>>>>>>>>>> the "yeah, we know" category.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> jg
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Nov 17, 2010
at 5:08 PM, Jakob Homan <
>>>>>>>> jhoman@yahoo-inc.com> wrote:
>>>>>>>>>>>>>>>>>> True, each patch
would get a -1 and the failing tests
>> would
>>>>>> need
>>>>>>>> to be
>>>>>>>>>>>>>>>>>> verified as those
known bad (BTW, it would be great if
>>>> Hudson
>>>>>>>> could list
>>>>>>>>>>>>>>>>>> which tests failed
in the message it posts to JIRA).  But
>>>>>> that's
>>>>>>>> still
>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>> a bit less error-prone
work than if the developer runs the
>>>>>> tests
>>>>>>>> and
>>>>>>>>>>>>>>>>>> test-patch themselves.
 Also, with 22 being cut, there are
>> a
>>>>>> lot
>>>>>>>> of
>>>>>>>>>>>>>>>>> patches
>>>>>>>>>>>>>>>>>> up in the air and
several developers are juggling multiple
>>>>>>>> patches.  The
>>>>>>>>>>>>>>>>>> more automation we
can have, even if it's not perfect,
>> will
>>>>>>>> decrease
>>>>>>>>>>>>>>>>> errors
>>>>>>>>>>>>>>>>>> we may make.
>>>>>>>>>>>>>>>>>> -jg
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Nigel Daley wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Nov 17, 2010,
at 3:11 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> It's
also ready to run on MapReduce and HDFS but we
>> won't
>>>>>>>> turn it on
>>>>>>>>>>>>>>>>>>>>> until
these projects build and test cleanly.  Looks
>> like
>>>>>> both
>>>>>>>> these
>>>>>>>>>>>>>>>>> projects
>>>>>>>>>>>>>>>>>>>>> currently
have test failures.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Assuming
the projects are compiling and building, is
>> there
>>>> a
>>>>>>>> reason to
>>>>>>>>>>>>>>>>>>>> not turn
it on despite the test failures? Hudson is
>>>>>> invaluable
>>>>>>>> to
>>>>>>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>>>>>> who then
don't have to run the tests and test-patch
>>>>>>>> themselves.  We
>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>> turn Hudson
off when it was working previously and there
>>>>>> were
>>>>>>>> known
>>>>>>>>>>>>>>>>>>>> failures.
 I think one of the reasons we have more
>> failing
>>>>>>>> tests now is
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> higher cost
of doing Hudson's work (not a great excuse I
>>>>>>>> know).  This
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> particularly
true now because several of the failing
>> tests
>>>>>>>> involve
>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>>>> timing out,
making the whole testing regime even longer.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Every single
patch would get a -1 and need investigation.
>>>>>>>> Currently,
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> would be about
83 investigations between MR and HDFS
>> issues
>>>>>>>> that are in
>>>>>>>>>>>>>>>>>>> patch available
state.  Shouldn't we focus on getting
>> these
>>>>>>>> tests fixed
>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> removed/?  Also,
I need to get MAPREDUCE-2172 fixed
>>>> (applies
>>>>>> to
>>>>>>>> HDFS as
>>>>>>>>>>>>>>>>>>> well) before
I turn this on.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> Nige
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Connect to me at http://www.facebook.com/dhruba
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message