hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nigel Daley <nda...@mac.com>
Subject Re: Patch testing
Date Wed, 26 Jan 2011 18:05:42 GMT
raid (contrib) test hanging: TestBlockFixer

I forced 2 thread dumps.  Both hung in the same place.  Filed https://issues.apache.org/jira/browse/MAPREDUCE-2283
 This is a blocker for turning on MR precommit.

Cheers,
Nige

On Jan 25, 2011, at 11:19 PM, Nigel Daley wrote:

> Started another trial run of MR precommit testing:
> https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/17/
> 
> Let's see if 17th time is a charm...
> 
> Nige
> 
> On Jan 7, 2011, at 5:14 PM, Todd Lipcon wrote:
> 
>> On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley <ndaley@mac.com> wrote:
>> 
>>> Hrm, the MR precommit test I'm running has hung (been running for 14 hours
>>> so far).  FWIW, 2 HDFS precommit tests are hung too.  I suspect it could be
>>> the NFS mounts on the machines.  I forced a thread dump which you can see in
>>> the console:
>>> https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console
>>> 
>>> 
>> Strange, haven't seen a hang like that before in handleConnectionFailure. It
>> should retry for 15 minutes max in that loop.
>> 
>> 
>>> Any other ideas why these might be hanging?
>>> 
>>> 
>> There is an HDFS bug right now that can cause hangs on some tests -
>> HDFS-1529 - would appreciate if someone can take a look. But I don't think
>> this is responsible for the MR hang above.
>> 
>> -Todd
>> 
>> 
>>> On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote:
>>> 
>>>> On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley <ndaley@mac.com> wrote:
>>>> 
>>>>> Thanks for looking into it Todd.  Let's first see if you think it can
be
>>>>> fixed quickly.  Let me know.
>>>>> 
>>>>> 
>>>> No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which
>>> fixes
>>>> this test timeout for me.
>>>> 
>>>> -Todd
>>>> 
>>>> 
>>>>> On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote:
>>>>> 
>>>>>> On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley <ndaley@mac.com>
wrote:
>>>>>> 
>>>>>>> Todd, would love to get
>>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first
>>> since
>>>>>>> this is failing every night on trunk.
>>>>>>> 
>>>>>> 
>>>>>> What if we disable that test, move that issue to 0.22 blocker, and
then
>>>>>> enable the test-patch? I'll also look into that one today, but if
it's
>>>>>> something that will take a while to fix, I don't think we should
hold
>>> off
>>>>>> the useful testing for all the other patches.
>>>>>> 
>>>>>> -Todd
>>>>>> 
>>>>>> On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote:
>>>>>>> 
>>>>>>>> Hi Nigel,
>>>>>>>> 
>>>>>>>> MAPREDUCE-2172 has been fixed for a while. Are there any
other
>>>>> particular
>>>>>>>> JIRAs you think need to be fixed before the MR test-patch
queue gets
>>>>>>>> enabled? I have a lot of outstanding patches and doing all
the
>>>>> test-patch
>>>>>>>> turnaround manually on 3 different boxes is a real headache.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> -Todd
>>>>>>>> 
>>>>>>>> On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley <ndaley@mac.com>
wrote:
>>>>>>>> 
>>>>>>>>> Ok, HDFS is now enabled.  You'll see a stream of updates
shortly on
>>>>> the
>>>>>>> ~30
>>>>>>>>> Patch Available HDFS issues.
>>>>>>>>> 
>>>>>>>>> Nige
>>>>>>>>> 
>>>>>>>>> On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote:
>>>>>>>>> 
>>>>>>>>>> I committed HDFS-1511 this morning.  We should be
good to go.  I
>>> can
>>>>>>>>>> haz snooty robot butler?
>>>>>>>>>> 
>>>>>>>>>> On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik
<
>>> cos@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>> Thanks Jacob. I am wasted already but I can do
it on Sun, I think,
>>>>>>>>>>> unless it is done earlier.
>>>>>>>>>>> --
>>>>>>>>>>> Take care,
>>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Dec 17, 2010 at 19:41, Jakob Homan <jghoman@gmail.com>
>>>>> wrote:
>>>>>>>>>>>> Ok.  I'll get a patch out for 1511 tomorrow,
unless someone wants
>>>>> to
>>>>>>>>>>>> whip one up tonight.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley
<ndaley@mac.com>
>>>>> wrote:
>>>>>>>>>>>>> I agree with Cos on fixing HDFS-1511
first. Once that is done
>>> I'll
>>>>>>>>> enable hdfs patch testing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Nige
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sent from my iPhone4
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Dec 17, 2010, at 7:01 PM, Konstantin
Boudnik <cos@apache.org
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> One more issue needs to be addressed
before test-patch is
>>> turned
>>>>> on
>>>>>>>>> HDFS is
>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-1511
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Take care,
>>>>>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 16:17, Konstantin
Boudnik <
>>>>> cos@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Considering that because of these
4 faulty cases every patch
>>>>> will
>>>>>>> be
>>>>>>>>>>>>>>> -1'ed a patch author will still
have to look at it and make a
>>>>>>>>> comment
>>>>>>>>>>>>>>> why this particular -1 isn't
valid. Lesser work, perhaps, but
>>>>>>>>> messier
>>>>>>>>>>>>>>> IMO. I'm not blocking it - I
just feel like there's a better
>>>>> way.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Take care,
>>>>>>>>>>>>>>> Konstantin (Cos) Boudnik
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 15:55,
Jakob Homan <jghoman@gmail.com
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> If HDFS is added to the
test-patch queue right now we get
>>>>>>>>>>>>>>>>> nothing but dozens of
-1'ed patches.
>>>>>>>>>>>>>>>> There aren't dozens of patches
being submitted currently.
>>> The
>>>>> -1
>>>>>>>>>>>>>>>> isn't the important thing,
it's the grunt work of actually
>>>>>>> running
>>>>>>>>>>>>>>>> (and waiting) for the tests,
test-patch, etc. that Hudson
>>> does
>>>>> so
>>>>>>>>> that
>>>>>>>>>>>>>>>> the developer doesn't have
to.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:48
PM, Dhruba Borthakur <
>>>>>>>>> dhruba@gmail.com> wrote:
>>>>>>>>>>>>>>>>> +1, thanks for doing
this.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Dec 17, 2010
at 3:19 PM, Jakob Homan <
>>>>> jghoman@gmail.com
>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So, with test-patch
updated to show the failing tests,
>>> saving
>>>>>>> the
>>>>>>>>>>>>>>>>>> developers the need
to go and verify that the failed tests
>>>>> are
>>>>>>>>> all
>>>>>>>>>>>>>>>>>> known, how do people
feel about turning on test-patch again
>>>>> for
>>>>>>>>> HDFS
>>>>>>>>>>>>>>>>>> and mapred?  I think
it'll help prevent any more tests from
>>>>>>>>> entering
>>>>>>>>>>>>>>>>>> the "yeah, we know"
category.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> jg
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Nov 17, 2010
at 5:08 PM, Jakob Homan <
>>>>>>>>> jhoman@yahoo-inc.com> wrote:
>>>>>>>>>>>>>>>>>>> True, each patch
would get a -1 and the failing tests
>>> would
>>>>>>> need
>>>>>>>>> to be
>>>>>>>>>>>>>>>>>>> verified as those
known bad (BTW, it would be great if
>>>>> Hudson
>>>>>>>>> could list
>>>>>>>>>>>>>>>>>>> which tests failed
in the message it posts to JIRA).  But
>>>>>>> that's
>>>>>>>>> still
>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>> a bit less error-prone
work than if the developer runs the
>>>>>>> tests
>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> test-patch themselves.
 Also, with 22 being cut, there are
>>> a
>>>>>>> lot
>>>>>>>>> of
>>>>>>>>>>>>>>>>>> patches
>>>>>>>>>>>>>>>>>>> up in the air
and several developers are juggling multiple
>>>>>>>>> patches.  The
>>>>>>>>>>>>>>>>>>> more automation
we can have, even if it's not perfect,
>>> will
>>>>>>>>> decrease
>>>>>>>>>>>>>>>>>> errors
>>>>>>>>>>>>>>>>>>> we may make.
>>>>>>>>>>>>>>>>>>> -jg
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Nigel Daley wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Nov 17,
2010, at 3:11 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> It's
also ready to run on MapReduce and HDFS but we
>>> won't
>>>>>>>>> turn it on
>>>>>>>>>>>>>>>>>>>>>> until
these projects build and test cleanly.  Looks
>>> like
>>>>>>> both
>>>>>>>>> these
>>>>>>>>>>>>>>>>>> projects
>>>>>>>>>>>>>>>>>>>>>> currently
have test failures.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Assuming
the projects are compiling and building, is
>>> there
>>>>> a
>>>>>>>>> reason to
>>>>>>>>>>>>>>>>>>>>> not turn
it on despite the test failures? Hudson is
>>>>>>> invaluable
>>>>>>>>> to
>>>>>>>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>>>>>>> who then
don't have to run the tests and test-patch
>>>>>>>>> themselves.  We
>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>>> turn
Hudson off when it was working previously and there
>>>>>>> were
>>>>>>>>> known
>>>>>>>>>>>>>>>>>>>>> failures.
 I think one of the reasons we have more
>>> failing
>>>>>>>>> tests now is
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> higher
cost of doing Hudson's work (not a great excuse I
>>>>>>>>> know).  This
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> particularly
true now because several of the failing
>>> tests
>>>>>>>>> involve
>>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>>>>> timing
out, making the whole testing regime even longer.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Every single
patch would get a -1 and need investigation.
>>>>>>>>> Currently,
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> would be
about 83 investigations between MR and HDFS
>>> issues
>>>>>>>>> that are in
>>>>>>>>>>>>>>>>>>>> patch available
state.  Shouldn't we focus on getting
>>> these
>>>>>>>>> tests fixed
>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>> removed/?
 Also, I need to get MAPREDUCE-2172 fixed
>>>>> (applies
>>>>>>> to
>>>>>>>>> HDFS as
>>>>>>>>>>>>>>>>>>>> well) before
I turn this on.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>> Nige
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Connect to me at http://www.facebook.com/dhruba
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Todd Lipcon
>>>>>> Software Engineer, Cloudera
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>> 
>>> 
>> 
>> 
>> -- 
>> Todd Lipcon
>> Software Engineer, Cloudera
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message