Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 39739 invoked from network); 8 Jan 2011 01:15:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Jan 2011 01:15:15 -0000 Received: (qmail 61995 invoked by uid 500); 8 Jan 2011 01:15:13 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 61902 invoked by uid 500); 8 Jan 2011 01:15:13 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 61894 invoked by uid 99); 8 Jan 2011 01:15:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Jan 2011 01:15:13 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Jan 2011 01:15:09 +0000 Received: by iyb26 with SMTP id 26so19227645iyb.35 for ; Fri, 07 Jan 2011 17:14:48 -0800 (PST) Received: by 10.231.31.139 with SMTP id y11mr9972646ibc.96.1294449288166; Fri, 07 Jan 2011 17:14:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.115.8 with HTTP; Fri, 7 Jan 2011 17:14:28 -0800 (PST) In-Reply-To: References: <20101020195420.GG2075@tp> <53F363B9-E865-4E63-907A-7F341A246235@yahoo-inc.com> <4D646D78-621B-4C50-9420-6B5EC7F49B54@mac.com> <7B1CE23C-E15A-4BA5-8D96-62163A56E23C@mac.com> <4CE46119.2030509@yahoo-inc.com> <8617BECB-78B4-42B6-B592-D7FC1F8DA923@mac.com> <4CE47C88.5050203@yahoo-inc.com> <41FB0800-3703-49C1-8069-DEB74FFE6FAC@mac.com> <7B140F50-27E0-47C7-8EF8-B897D26CEE49@mac.com> <394BBDCC-9561-4E07-8EBA-AE3A92814E5A@mac.com> From: Todd Lipcon Date: Fri, 7 Jan 2011 17:14:28 -0800 Message-ID: Subject: Re: Patch testing To: general@hadoop.apache.org Content-Type: multipart/alternative; boundary=00235433303a003c6704994b76fe --00235433303a003c6704994b76fe Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley wrote: > Hrm, the MR precommit test I'm running has hung (been running for 14 hours > so far). FWIW, 2 HDFS precommit tests are hung too. I suspect it could be > the NFS mounts on the machines. I forced a thread dump which you can see in > the console: > https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console > > Strange, haven't seen a hang like that before in handleConnectionFailure. It should retry for 15 minutes max in that loop. > Any other ideas why these might be hanging? > > There is an HDFS bug right now that can cause hangs on some tests - HDFS-1529 - would appreciate if someone can take a look. But I don't think this is responsible for the MR hang above. -Todd > On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote: > > > On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley wrote: > > > >> Thanks for looking into it Todd. Let's first see if you think it can be > >> fixed quickly. Let me know. > >> > >> > > No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which > fixes > > this test timeout for me. > > > > -Todd > > > > > >> On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote: > >> > >>> On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley wrote: > >>> > >>>> Todd, would love to get > >>>> https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first > since > >>>> this is failing every night on trunk. > >>>> > >>> > >>> What if we disable that test, move that issue to 0.22 blocker, and then > >>> enable the test-patch? I'll also look into that one today, but if it's > >>> something that will take a while to fix, I don't think we should hold > off > >>> the useful testing for all the other patches. > >>> > >>> -Todd > >>> > >>> On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote: > >>>> > >>>>> Hi Nigel, > >>>>> > >>>>> MAPREDUCE-2172 has been fixed for a while. Are there any other > >> particular > >>>>> JIRAs you think need to be fixed before the MR test-patch queue gets > >>>>> enabled? I have a lot of outstanding patches and doing all the > >> test-patch > >>>>> turnaround manually on 3 different boxes is a real headache. > >>>>> > >>>>> Thanks > >>>>> -Todd > >>>>> > >>>>> On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley wrote: > >>>>> > >>>>>> Ok, HDFS is now enabled. You'll see a stream of updates shortly on > >> the > >>>> ~30 > >>>>>> Patch Available HDFS issues. > >>>>>> > >>>>>> Nige > >>>>>> > >>>>>> On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote: > >>>>>> > >>>>>>> I committed HDFS-1511 this morning. We should be good to go. I > can > >>>>>>> haz snooty robot butler? > >>>>>>> > >>>>>>> On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik < > cos@apache.org> > >>>>>> wrote: > >>>>>>>> Thanks Jacob. I am wasted already but I can do it on Sun, I think, > >>>>>>>> unless it is done earlier. > >>>>>>>> -- > >>>>>>>> Take care, > >>>>>>>> Konstantin (Cos) Boudnik > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Dec 17, 2010 at 19:41, Jakob Homan > >> wrote: > >>>>>>>>> Ok. I'll get a patch out for 1511 tomorrow, unless someone wants > >> to > >>>>>>>>> whip one up tonight. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley > >> wrote: > >>>>>>>>>> I agree with Cos on fixing HDFS-1511 first. Once that is done > I'll > >>>>>> enable hdfs patch testing. > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> Nige > >>>>>>>>>> > >>>>>>>>>> Sent from my iPhone4 > >>>>>>>>>> > >>>>>>>>>> On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik > > >>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> One more issue needs to be addressed before test-patch is > turned > >> on > >>>>>> HDFS is > >>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-1511 > >>>>>>>>>>> -- > >>>>>>>>>>> Take care, > >>>>>>>>>>> Konstantin (Cos) Boudnik > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik < > >> cos@apache.org> > >>>>>> wrote: > >>>>>>>>>>>> Considering that because of these 4 faulty cases every patch > >> will > >>>> be > >>>>>>>>>>>> -1'ed a patch author will still have to look at it and make a > >>>>>> comment > >>>>>>>>>>>> why this particular -1 isn't valid. Lesser work, perhaps, but > >>>>>> messier > >>>>>>>>>>>> IMO. I'm not blocking it - I just feel like there's a better > >> way. > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Take care, > >>>>>>>>>>>> Konstantin (Cos) Boudnik > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Dec 17, 2010 at 15:55, Jakob Homan > > >>>>>> wrote: > >>>>>>>>>>>>>> If HDFS is added to the test-patch queue right now we get > >>>>>>>>>>>>>> nothing but dozens of -1'ed patches. > >>>>>>>>>>>>> There aren't dozens of patches being submitted currently. > The > >> -1 > >>>>>>>>>>>>> isn't the important thing, it's the grunt work of actually > >>>> running > >>>>>>>>>>>>> (and waiting) for the tests, test-patch, etc. that Hudson > does > >> so > >>>>>> that > >>>>>>>>>>>>> the developer doesn't have to. > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur < > >>>>>> dhruba@gmail.com> wrote: > >>>>>>>>>>>>>> +1, thanks for doing this. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan < > >> jghoman@gmail.com > >>>>> > >>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> So, with test-patch updated to show the failing tests, > saving > >>>> the > >>>>>>>>>>>>>>> developers the need to go and verify that the failed tests > >> are > >>>>>> all > >>>>>>>>>>>>>>> known, how do people feel about turning on test-patch again > >> for > >>>>>> HDFS > >>>>>>>>>>>>>>> and mapred? I think it'll help prevent any more tests from > >>>>>> entering > >>>>>>>>>>>>>>> the "yeah, we know" category. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> jg > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan < > >>>>>> jhoman@yahoo-inc.com> wrote: > >>>>>>>>>>>>>>>> True, each patch would get a -1 and the failing tests > would > >>>> need > >>>>>> to be > >>>>>>>>>>>>>>>> verified as those known bad (BTW, it would be great if > >> Hudson > >>>>>> could list > >>>>>>>>>>>>>>>> which tests failed in the message it posts to JIRA). But > >>>> that's > >>>>>> still > >>>>>>>>>>>>>>> quite > >>>>>>>>>>>>>>>> a bit less error-prone work than if the developer runs the > >>>> tests > >>>>>> and > >>>>>>>>>>>>>>>> test-patch themselves. Also, with 22 being cut, there are > a > >>>> lot > >>>>>> of > >>>>>>>>>>>>>>> patches > >>>>>>>>>>>>>>>> up in the air and several developers are juggling multiple > >>>>>> patches. The > >>>>>>>>>>>>>>>> more automation we can have, even if it's not perfect, > will > >>>>>> decrease > >>>>>>>>>>>>>>> errors > >>>>>>>>>>>>>>>> we may make. > >>>>>>>>>>>>>>>> -jg > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Nigel Daley wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> It's also ready to run on MapReduce and HDFS but we > won't > >>>>>> turn it on > >>>>>>>>>>>>>>>>>>> until these projects build and test cleanly. Looks > like > >>>> both > >>>>>> these > >>>>>>>>>>>>>>> projects > >>>>>>>>>>>>>>>>>>> currently have test failures. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Assuming the projects are compiling and building, is > there > >> a > >>>>>> reason to > >>>>>>>>>>>>>>>>>> not turn it on despite the test failures? Hudson is > >>>> invaluable > >>>>>> to > >>>>>>>>>>>>>>> developers > >>>>>>>>>>>>>>>>>> who then don't have to run the tests and test-patch > >>>>>> themselves. We > >>>>>>>>>>>>>>> didn't > >>>>>>>>>>>>>>>>>> turn Hudson off when it was working previously and there > >>>> were > >>>>>> known > >>>>>>>>>>>>>>>>>> failures. I think one of the reasons we have more > failing > >>>>>> tests now is > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> higher cost of doing Hudson's work (not a great excuse I > >>>>>> know). This > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> particularly true now because several of the failing > tests > >>>>>> involve > >>>>>>>>>>>>>>> tests > >>>>>>>>>>>>>>>>>> timing out, making the whole testing regime even longer. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Every single patch would get a -1 and need investigation. > >>>>>> Currently, > >>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>> would be about 83 investigations between MR and HDFS > issues > >>>>>> that are in > >>>>>>>>>>>>>>>>> patch available state. Shouldn't we focus on getting > these > >>>>>> tests fixed > >>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>> removed/? Also, I need to get MAPREDUCE-2172 fixed > >> (applies > >>>> to > >>>>>> HDFS as > >>>>>>>>>>>>>>>>> well) before I turn this on. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>> Nige > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> Connect to me at http://www.facebook.com/dhruba > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Todd Lipcon > >>>>> Software Engineer, Cloudera > >>>> > >>>> > >>> > >>> > >>> -- > >>> Todd Lipcon > >>> Software Engineer, Cloudera > >> > >> > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera --00235433303a003c6704994b76fe--