Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 33179 invoked from network); 26 Jan 2011 18:06:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Jan 2011 18:06:39 -0000 Received: (qmail 54711 invoked by uid 500); 26 Jan 2011 18:06:38 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 54273 invoked by uid 500); 26 Jan 2011 18:06:33 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 54265 invoked by uid 99); 26 Jan 2011 18:06:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jan 2011 18:06:32 +0000 X-ASF-Spam-Status: No, hits=4.7 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ndaley@mac.com designates 17.148.16.96 as permitted sender) Received: from [17.148.16.96] (HELO asmtpout021.mac.com) (17.148.16.96) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jan 2011 18:06:21 +0000 MIME-version: 1.0 Content-type: multipart/alternative; boundary="Boundary_(ID_HEiLJUPsR4bQt8nzfCXyqQ)" Received: from [10.0.1.13] ([71.198.192.174]) by asmtp021.mac.com (Oracle Communications Messaging Exchange Server 7u4-20.01 64bit (built Nov 21 2010)) with ESMTPSA id <0LFN00IMP69JC9I0@asmtp021.mac.com> for general@hadoop.apache.org; Wed, 26 Jan 2011 10:05:45 -0800 (PST) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.2.15,1.0.148,0.0.0000 definitions=2011-01-26_08:2011-01-26,2011-01-26,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 suspectscore=1 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=6.0.2-1012030000 definitions=main-1101260111 Sun-Java-System-SMTP-Warning: Lines longer than SMTP allows found and wrapped. From: Nigel Daley Subject: Re: Patch testing Date: Wed, 26 Jan 2011 10:05:42 -0800 In-reply-to: To: general@hadoop.apache.org References: <20101020195420.GG2075@tp> <53F363B9-E865-4E63-907A-7F341A246235@yahoo-inc.com> <4D646D78-621B-4C50-9420-6B5EC7F49B54@mac.com> <7B1CE23C-E15A-4BA5-8D96-62163A56E23C@mac.com> <4CE46119.2030509@yahoo-inc.com> <8617BECB-78B4-42B6-B592-D7FC1F8DA923@mac.com> <4CE47C88.5050203@yahoo-inc.com> <41FB0800-3703-49C1-8069-DEB74FFE6FAC@mac.com> <7B140F50-27E0-47C7-8EF8-B897D26CEE49@mac.com> <394BBDCC-9561-4E07-8EBA-AE3A92814E5A@mac.com> Message-id: <804715D0-C9D4-41F4-B209-4A022053DEBD@mac.com> X-Mailer: Apple Mail (2.1082) X-Virus-Checked: Checked by ClamAV on apache.org --Boundary_(ID_HEiLJUPsR4bQt8nzfCXyqQ) Content-type: text/plain; CHARSET=US-ASCII Content-transfer-encoding: 7BIT raid (contrib) test hanging: TestBlockFixer I forced 2 thread dumps. Both hung in the same place. Filed https://issues.apache.org/jira/browse/MAPREDUCE-2283 This is a blocker for turning on MR precommit. Cheers, Nige On Jan 25, 2011, at 11:19 PM, Nigel Daley wrote: > Started another trial run of MR precommit testing: > https://hudson.apache.org/hudson/view/G-L/view/Hadoop/job/PreCommit-MAPREDUCE-Build/17/ > > Let's see if 17th time is a charm... > > Nige > > On Jan 7, 2011, at 5:14 PM, Todd Lipcon wrote: > >> On Fri, Jan 7, 2011 at 2:11 PM, Nigel Daley wrote: >> >>> Hrm, the MR precommit test I'm running has hung (been running for 14 hours >>> so far). FWIW, 2 HDFS precommit tests are hung too. I suspect it could be >>> the NFS mounts on the machines. I forced a thread dump which you can see in >>> the console: >>> https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/10/console >>> >>> >> Strange, haven't seen a hang like that before in handleConnectionFailure. It >> should retry for 15 minutes max in that loop. >> >> >>> Any other ideas why these might be hanging? >>> >>> >> There is an HDFS bug right now that can cause hangs on some tests - >> HDFS-1529 - would appreciate if someone can take a look. But I don't think >> this is responsible for the MR hang above. >> >> -Todd >> >> >>> On Jan 5, 2011, at 5:42 PM, Todd Lipcon wrote: >>> >>>> On Wed, Jan 5, 2011 at 4:39 PM, Nigel Daley wrote: >>>> >>>>> Thanks for looking into it Todd. Let's first see if you think it can be >>>>> fixed quickly. Let me know. >>>>> >>>>> >>>> No problem, it wasn't too bad after all. Patch up on HADOOP-7087 which >>> fixes >>>> this test timeout for me. >>>> >>>> -Todd >>>> >>>> >>>>> On Jan 5, 2011, at 4:33 PM, Todd Lipcon wrote: >>>>> >>>>>> On Wed, Jan 5, 2011 at 4:19 PM, Nigel Daley wrote: >>>>>> >>>>>>> Todd, would love to get >>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-2121 fixed first >>> since >>>>>>> this is failing every night on trunk. >>>>>>> >>>>>> >>>>>> What if we disable that test, move that issue to 0.22 blocker, and then >>>>>> enable the test-patch? I'll also look into that one today, but if it's >>>>>> something that will take a while to fix, I don't think we should hold >>> off >>>>>> the useful testing for all the other patches. >>>>>> >>>>>> -Todd >>>>>> >>>>>> On Jan 5, 2011, at 2:45 PM, Todd Lipcon wrote: >>>>>>> >>>>>>>> Hi Nigel, >>>>>>>> >>>>>>>> MAPREDUCE-2172 has been fixed for a while. Are there any other >>>>> particular >>>>>>>> JIRAs you think need to be fixed before the MR test-patch queue gets >>>>>>>> enabled? I have a lot of outstanding patches and doing all the >>>>> test-patch >>>>>>>> turnaround manually on 3 different boxes is a real headache. >>>>>>>> >>>>>>>> Thanks >>>>>>>> -Todd >>>>>>>> >>>>>>>> On Tue, Dec 21, 2010 at 1:33 PM, Nigel Daley wrote: >>>>>>>> >>>>>>>>> Ok, HDFS is now enabled. You'll see a stream of updates shortly on >>>>> the >>>>>>> ~30 >>>>>>>>> Patch Available HDFS issues. >>>>>>>>> >>>>>>>>> Nige >>>>>>>>> >>>>>>>>> On Dec 20, 2010, at 12:42 PM, Jakob Homan wrote: >>>>>>>>> >>>>>>>>>> I committed HDFS-1511 this morning. We should be good to go. I >>> can >>>>>>>>>> haz snooty robot butler? >>>>>>>>>> >>>>>>>>>> On Fri, Dec 17, 2010 at 8:31 PM, Konstantin Boudnik < >>> cos@apache.org> >>>>>>>>> wrote: >>>>>>>>>>> Thanks Jacob. I am wasted already but I can do it on Sun, I think, >>>>>>>>>>> unless it is done earlier. >>>>>>>>>>> -- >>>>>>>>>>> Take care, >>>>>>>>>>> Konstantin (Cos) Boudnik >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Dec 17, 2010 at 19:41, Jakob Homan >>>>> wrote: >>>>>>>>>>>> Ok. I'll get a patch out for 1511 tomorrow, unless someone wants >>>>> to >>>>>>>>>>>> whip one up tonight. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Dec 17, 2010 at 7:22 PM, Nigel Daley >>>>> wrote: >>>>>>>>>>>>> I agree with Cos on fixing HDFS-1511 first. Once that is done >>> I'll >>>>>>>>> enable hdfs patch testing. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Nige >>>>>>>>>>>>> >>>>>>>>>>>>> Sent from my iPhone4 >>>>>>>>>>>>> >>>>>>>>>>>>> On Dec 17, 2010, at 7:01 PM, Konstantin Boudnik >>> >>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> One more issue needs to be addressed before test-patch is >>> turned >>>>> on >>>>>>>>> HDFS is >>>>>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-1511 >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Take care, >>>>>>>>>>>>>> Konstantin (Cos) Boudnik >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 16:17, Konstantin Boudnik < >>>>> cos@apache.org> >>>>>>>>> wrote: >>>>>>>>>>>>>>> Considering that because of these 4 faulty cases every patch >>>>> will >>>>>>> be >>>>>>>>>>>>>>> -1'ed a patch author will still have to look at it and make a >>>>>>>>> comment >>>>>>>>>>>>>>> why this particular -1 isn't valid. Lesser work, perhaps, but >>>>>>>>> messier >>>>>>>>>>>>>>> IMO. I'm not blocking it - I just feel like there's a better >>>>> way. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Take care, >>>>>>>>>>>>>>> Konstantin (Cos) Boudnik >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 15:55, Jakob Homan >>> >>>>>>>>> wrote: >>>>>>>>>>>>>>>>> If HDFS is added to the test-patch queue right now we get >>>>>>>>>>>>>>>>> nothing but dozens of -1'ed patches. >>>>>>>>>>>>>>>> There aren't dozens of patches being submitted currently. >>> The >>>>> -1 >>>>>>>>>>>>>>>> isn't the important thing, it's the grunt work of actually >>>>>>> running >>>>>>>>>>>>>>>> (and waiting) for the tests, test-patch, etc. that Hudson >>> does >>>>> so >>>>>>>>> that >>>>>>>>>>>>>>>> the developer doesn't have to. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:48 PM, Dhruba Borthakur < >>>>>>>>> dhruba@gmail.com> wrote: >>>>>>>>>>>>>>>>> +1, thanks for doing this. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Dec 17, 2010 at 3:19 PM, Jakob Homan < >>>>> jghoman@gmail.com >>>>>>>> >>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> So, with test-patch updated to show the failing tests, >>> saving >>>>>>> the >>>>>>>>>>>>>>>>>> developers the need to go and verify that the failed tests >>>>> are >>>>>>>>> all >>>>>>>>>>>>>>>>>> known, how do people feel about turning on test-patch again >>>>> for >>>>>>>>> HDFS >>>>>>>>>>>>>>>>>> and mapred? I think it'll help prevent any more tests from >>>>>>>>> entering >>>>>>>>>>>>>>>>>> the "yeah, we know" category. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> jg >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Nov 17, 2010 at 5:08 PM, Jakob Homan < >>>>>>>>> jhoman@yahoo-inc.com> wrote: >>>>>>>>>>>>>>>>>>> True, each patch would get a -1 and the failing tests >>> would >>>>>>> need >>>>>>>>> to be >>>>>>>>>>>>>>>>>>> verified as those known bad (BTW, it would be great if >>>>> Hudson >>>>>>>>> could list >>>>>>>>>>>>>>>>>>> which tests failed in the message it posts to JIRA). But >>>>>>> that's >>>>>>>>> still >>>>>>>>>>>>>>>>>> quite >>>>>>>>>>>>>>>>>>> a bit less error-prone work than if the developer runs the >>>>>>> tests >>>>>>>>> and >>>>>>>>>>>>>>>>>>> test-patch themselves. Also, with 22 being cut, there are >>> a >>>>>>> lot >>>>>>>>> of >>>>>>>>>>>>>>>>>> patches >>>>>>>>>>>>>>>>>>> up in the air and several developers are juggling multiple >>>>>>>>> patches. The >>>>>>>>>>>>>>>>>>> more automation we can have, even if it's not perfect, >>> will >>>>>>>>> decrease >>>>>>>>>>>>>>>>>> errors >>>>>>>>>>>>>>>>>>> we may make. >>>>>>>>>>>>>>>>>>> -jg >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Nigel Daley wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Nov 17, 2010, at 3:11 PM, Jakob Homan wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> It's also ready to run on MapReduce and HDFS but we >>> won't >>>>>>>>> turn it on >>>>>>>>>>>>>>>>>>>>>> until these projects build and test cleanly. Looks >>> like >>>>>>> both >>>>>>>>> these >>>>>>>>>>>>>>>>>> projects >>>>>>>>>>>>>>>>>>>>>> currently have test failures. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Assuming the projects are compiling and building, is >>> there >>>>> a >>>>>>>>> reason to >>>>>>>>>>>>>>>>>>>>> not turn it on despite the test failures? Hudson is >>>>>>> invaluable >>>>>>>>> to >>>>>>>>>>>>>>>>>> developers >>>>>>>>>>>>>>>>>>>>> who then don't have to run the tests and test-patch >>>>>>>>> themselves. We >>>>>>>>>>>>>>>>>> didn't >>>>>>>>>>>>>>>>>>>>> turn Hudson off when it was working previously and there >>>>>>> were >>>>>>>>> known >>>>>>>>>>>>>>>>>>>>> failures. I think one of the reasons we have more >>> failing >>>>>>>>> tests now is >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> higher cost of doing Hudson's work (not a great excuse I >>>>>>>>> know). This >>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>> particularly true now because several of the failing >>> tests >>>>>>>>> involve >>>>>>>>>>>>>>>>>> tests >>>>>>>>>>>>>>>>>>>>> timing out, making the whole testing regime even longer. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Every single patch would get a -1 and need investigation. >>>>>>>>> Currently, >>>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>> would be about 83 investigations between MR and HDFS >>> issues >>>>>>>>> that are in >>>>>>>>>>>>>>>>>>>> patch available state. Shouldn't we focus on getting >>> these >>>>>>>>> tests fixed >>>>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>>>>>> removed/? Also, I need to get MAPREDUCE-2172 fixed >>>>> (applies >>>>>>> to >>>>>>>>> HDFS as >>>>>>>>>>>>>>>>>>>> well) before I turn this on. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> Nige >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Connect to me at http://www.facebook.com/dhruba >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Todd Lipcon >>>>>>>> Software Engineer, Cloudera >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Todd Lipcon >>>>>> Software Engineer, Cloudera >>>>> >>>>> >>>> >>>> >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera > --Boundary_(ID_HEiLJUPsR4bQt8nzfCXyqQ)--