hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Patch testing
Date Wed, 26 Jan 2011 18:32:20 GMT
On Wed, Jan 26, 2011 at 10:05 AM, Nigel Daley <ndaley@mac.com> wrote:

> raid (contrib) test hanging: TestBlockFixer
>
> I forced 2 thread dumps.  Both hung in the same place.  Filed
> https://issues.apache.org/jira/browse/MAPREDUCE-2283  This is a blocker
> for turning on MR precommit.
>

Since this is contrib, I'd like to suggest just disabling this test
temporarily. We can re-enable it once it's fixed.

Not having MR pre-commit working has been pretty painful.

-Todd


> On Jan 25, 2011, at 11:19 PM, Nigel Daley wrote:
>
> > Started another trial run of MR precommit testing:
> >
> https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/17/
> >
> > Let's see if 17th time is a charm...
> >
> > Nige
> >
> > On Jan 7, 2011, at 5:14 PM, Todd Lipcon wrote:
> >
> >> On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley <ndaley@mac.com> wrote:
> >>
> >>> Hrm, the MR precommit test I'm running has hung (been running for 14
> hours
> >>> so far).  FWIW, 2 HDFS precommit tests are hung too.  I suspect it
> could be
> >>> the NFS mounts on the machines.  I forced a thread dump which you can
> see in
> >>> the console:
> >>>
> https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console
> >>>
> >>>
> >> Strange, haven't seen a hang like that before in
> handleConnectionFailure. It
> >> should retry for 15 minutes max in that loop.
> >>
> >>
> >>> Any other ideas why these might be hanging?
> >>>
> >>>
> >> There is an HDFS bug right now that can cause hangs on some tests -
> >> HDFS-1529 - would appreciate if someone can take a look. But I don't
> think
> >> this is responsible for the MR hang above.
> >>
> >> -Todd
> >>
> >>
> >>> On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote:
> >>>
> >>>> On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley <ndaley@mac.com> wrote:
> >>>>
> >>>>> Thanks for looking into it Todd.  Let's first see if you think it
can
> be
> >>>>> fixed quickly.  Let me know.
> >>>>>
> >>>>>
> >>>> No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which
> >>> fixes
> >>>> this test timeout for me.
> >>>>
> >>>> -Todd
> >>>>
> >>>>
> >>>>> On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote:
> >>>>>
> >>>>>> On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley <ndaley@mac.com>
wrote:
> >>>>>>
> >>>>>>> Todd, would love to get
> >>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed
first
> >>> since
> >>>>>>> this is failing every night on trunk.
> >>>>>>>
> >>>>>>
> >>>>>> What if we disable that test, move that issue to 0.22 blocker,
and
> then
> >>>>>> enable the test-patch? I'll also look into that one today, but
if
> it's
> >>>>>> something that will take a while to fix, I don't think we should
> hold
> >>> off
> >>>>>> the useful testing for all the other patches.
> >>>>>>
> >>>>>> -Todd
> >>>>>>
> >>>>>> On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote:
> >>>>>>>
> >>>>>>>> Hi Nigel,
> >>>>>>>>
> >>>>>>>> MAPREDUCE-2172 has been fixed for a while. Are there
any other
> >>>>> particular
> >>>>>>>> JIRAs you think need to be fixed before the MR test-patch
queue
> gets
> >>>>>>>> enabled? I have a lot of outstanding patches and doing
all the
> >>>>> test-patch
> >>>>>>>> turnaround manually on 3 different boxes is a real headache.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> -Todd
> >>>>>>>>
> >>>>>>>> On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley <ndaley@mac.com>
> wrote:
> >>>>>>>>
> >>>>>>>>> Ok, HDFS is now enabled.  You'll see a stream of
updates shortly
> on
> >>>>> the
> >>>>>>> ~30
> >>>>>>>>> Patch Available HDFS issues.
> >>>>>>>>>
> >>>>>>>>> Nige
> >>>>>>>>>
> >>>>>>>>> On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote:
> >>>>>>>>>
> >>>>>>>>>> I committed HDFS-1511 this morning.  We should
be good to go.  I
> >>> can
> >>>>>>>>>> haz snooty robot butler?
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Dec 17, 2010 at 8:31 PM, Konstantin
Boudnik <
> >>> cos@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>>> Thanks Jacob. I am wasted already but I
can do it on Sun, I
> think,
> >>>>>>>>>>> unless it is done earlier.
> >>>>>>>>>>> --
> >>>>>>>>>>> Take care,
> >>>>>>>>>>> Konstantin (Cos) Boudnik
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Dec 17, 2010 at 19:41, Jakob Homan
<jghoman@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>>> Ok.  I'll get a patch out for 1511 tomorrow,
unless someone
> wants
> >>>>> to
> >>>>>>>>>>>> whip one up tonight.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Dec 17, 2010 at 7:22 PM, Nigel
Daley <ndaley@mac.com>
> >>>>> wrote:
> >>>>>>>>>>>>> I agree with Cos on fixing HDFS-1511
first. Once that is done
> >>> I'll
> >>>>>>>>> enable hdfs patch testing.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>> Nige
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sent from my iPhone4
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Dec 17, 2010, at 7:01 PM, Konstantin
Boudnik <
> cos@apache.org
> >>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> One more issue needs to be addressed
before test-patch is
> >>> turned
> >>>>> on
> >>>>>>>>> HDFS is
> >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-1511
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Take care,
> >>>>>>>>>>>>>> Konstantin (Cos) Boudnik
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 16:17,
Konstantin Boudnik <
> >>>>> cos@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>> Considering that because
of these 4 faulty cases every
> patch
> >>>>> will
> >>>>>>> be
> >>>>>>>>>>>>>>> -1'ed a patch author will
still have to look at it and make
> a
> >>>>>>>>> comment
> >>>>>>>>>>>>>>> why this particular -1 isn't
valid. Lesser work, perhaps,
> but
> >>>>>>>>> messier
> >>>>>>>>>>>>>>> IMO. I'm not blocking it
- I just feel like there's a
> better
> >>>>> way.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Take care,
> >>>>>>>>>>>>>>> Konstantin (Cos) Boudnik
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at
15:55, Jakob Homan <
> jghoman@gmail.com
> >>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> If HDFS is added
to the test-patch queue right now we get
> >>>>>>>>>>>>>>>>> nothing but dozens
of -1'ed patches.
> >>>>>>>>>>>>>>>> There aren't dozens
of patches being submitted currently.
> >>> The
> >>>>> -1
> >>>>>>>>>>>>>>>> isn't the important
thing, it's the grunt work of actually
> >>>>>>> running
> >>>>>>>>>>>>>>>> (and waiting) for the
tests, test-patch, etc. that Hudson
> >>> does
> >>>>> so
> >>>>>>>>> that
> >>>>>>>>>>>>>>>> the developer doesn't
have to.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Dec 17, 2010
at 3:48 PM, Dhruba Borthakur <
> >>>>>>>>> dhruba@gmail.com> wrote:
> >>>>>>>>>>>>>>>>> +1, thanks for doing
this.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, Dec 17,
2010 at 3:19 PM, Jakob Homan <
> >>>>> jghoman@gmail.com
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So, with test-patch
updated to show the failing tests,
> >>> saving
> >>>>>>> the
> >>>>>>>>>>>>>>>>>> developers the
need to go and verify that the failed
> tests
> >>>>> are
> >>>>>>>>> all
> >>>>>>>>>>>>>>>>>> known, how do
people feel about turning on test-patch
> again
> >>>>> for
> >>>>>>>>> HDFS
> >>>>>>>>>>>>>>>>>> and mapred?
 I think it'll help prevent any more tests
> from
> >>>>>>>>> entering
> >>>>>>>>>>>>>>>>>> the "yeah, we
know" category.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>> jg
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, Nov
17, 2010 at 5:08 PM, Jakob Homan <
> >>>>>>>>> jhoman@yahoo-inc.com> wrote:
> >>>>>>>>>>>>>>>>>>> True, each
patch would get a -1 and the failing tests
> >>> would
> >>>>>>> need
> >>>>>>>>> to be
> >>>>>>>>>>>>>>>>>>> verified
as those known bad (BTW, it would be great if
> >>>>> Hudson
> >>>>>>>>> could list
> >>>>>>>>>>>>>>>>>>> which tests
failed in the message it posts to JIRA).
>  But
> >>>>>>> that's
> >>>>>>>>> still
> >>>>>>>>>>>>>>>>>> quite
> >>>>>>>>>>>>>>>>>>> a bit less
error-prone work than if the developer runs
> the
> >>>>>>> tests
> >>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> test-patch
themselves.  Also, with 22 being cut, there
> are
> >>> a
> >>>>>>> lot
> >>>>>>>>> of
> >>>>>>>>>>>>>>>>>> patches
> >>>>>>>>>>>>>>>>>>> up in the
air and several developers are juggling
> multiple
> >>>>>>>>> patches.  The
> >>>>>>>>>>>>>>>>>>> more automation
we can have, even if it's not perfect,
> >>> will
> >>>>>>>>> decrease
> >>>>>>>>>>>>>>>>>> errors
> >>>>>>>>>>>>>>>>>>> we may make.
> >>>>>>>>>>>>>>>>>>> -jg
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Nigel Daley
wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Nov
17, 2010, at 3:11 PM, Jakob Homan wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
It's also ready to run on MapReduce and HDFS but we
> >>> won't
> >>>>>>>>> turn it on
> >>>>>>>>>>>>>>>>>>>>>>
until these projects build and test cleanly.  Looks
> >>> like
> >>>>>>> both
> >>>>>>>>> these
> >>>>>>>>>>>>>>>>>> projects
> >>>>>>>>>>>>>>>>>>>>>>
currently have test failures.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
Assuming the projects are compiling and building, is
> >>> there
> >>>>> a
> >>>>>>>>> reason to
> >>>>>>>>>>>>>>>>>>>>>
not turn it on despite the test failures? Hudson is
> >>>>>>> invaluable
> >>>>>>>>> to
> >>>>>>>>>>>>>>>>>> developers
> >>>>>>>>>>>>>>>>>>>>>
who then don't have to run the tests and test-patch
> >>>>>>>>> themselves.  We
> >>>>>>>>>>>>>>>>>> didn't
> >>>>>>>>>>>>>>>>>>>>>
turn Hudson off when it was working previously and
> there
> >>>>>>> were
> >>>>>>>>> known
> >>>>>>>>>>>>>>>>>>>>>
failures.  I think one of the reasons we have more
> >>> failing
> >>>>>>>>> tests now is
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>
higher cost of doing Hudson's work (not a great
> excuse I
> >>>>>>>>> know).  This
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>
particularly true now because several of the failing
> >>> tests
> >>>>>>>>> involve
> >>>>>>>>>>>>>>>>>> tests
> >>>>>>>>>>>>>>>>>>>>>
timing out, making the whole testing regime even
> longer.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Every
single patch would get a -1 and need
> investigation.
> >>>>>>>>> Currently,
> >>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> would
be about 83 investigations between MR and HDFS
> >>> issues
> >>>>>>>>> that are in
> >>>>>>>>>>>>>>>>>>>> patch
available state.  Shouldn't we focus on getting
> >>> these
> >>>>>>>>> tests fixed
> >>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>> removed/?
 Also, I need to get MAPREDUCE-2172 fixed
> >>>>> (applies
> >>>>>>> to
> >>>>>>>>> HDFS as
> >>>>>>>>>>>>>>>>>>>> well)
before I turn this on.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>> Nige
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> Connect to me at
http://www.facebook.com/dhruba
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Todd Lipcon
> >>>>>>>> Software Engineer, Cloudera
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Todd Lipcon
> >>>>>> Software Engineer, Cloudera
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >>>
> >>>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message