Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2057C11927 for ; Fri, 27 Jun 2014 00:00:53 +0000 (UTC) Received: (qmail 97727 invoked by uid 500); 27 Jun 2014 00:00:52 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 97645 invoked by uid 500); 27 Jun 2014 00:00:52 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 97632 invoked by uid 99); 27 Jun 2014 00:00:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jun 2014 00:00:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of olorinbant@gmail.com designates 74.125.82.170 as permitted sender) Received: from [74.125.82.170] (HELO mail-we0-f170.google.com) (74.125.82.170) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jun 2014 00:00:46 +0000 Received: by mail-we0-f170.google.com with SMTP id w61so4341582wes.15 for ; Thu, 26 Jun 2014 17:00:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=yEGpH21gnsGkvya7wAipVsfKEuPWf5CY2qCE7MbaO/k=; b=uoEyWUnpXeTPYdSA6notrj1sacqTX/Cc0RZ2ZZt0tSxWWelOuYO6Z2bIqOKgwufsnU B3vaYw4kvMGVfgqxtkhComTHoJ/J5zXr0bstk3V4oLlJ0+GkWnDPQ/n2PnjcIoidIRSz d1KI+HQ2vG5cFuc0ANKB7Zy3Roui6VGFf4qymEsY8jA68Krg81eDdjxjiBw5sjGA5Asa EHbXfIyrJBC873k/o3ggS2IZa9SjFyxqLXPM8qz+3c4Skp4aL8cOr0aC0RAT7Q8LL2Zt 3+HFsXIjj2JUzxBQ5b+fh0sRpwnOe2QDcHn6ZxiLV7wgJZPUhDTIA2DkSPLyErx4exhE n57g== X-Received: by 10.181.8.198 with SMTP id dm6mr7727621wid.30.1403827225353; Thu, 26 Jun 2014 17:00:25 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.86.199 with HTTP; Thu, 26 Jun 2014 16:59:55 -0700 (PDT) In-Reply-To: References: From: Mikhail Antonov Date: Thu, 26 Jun 2014 16:59:55 -0700 Message-ID: Subject: Re: Planning to roll the 0.98.4 RC on 6/30 To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a1135ea9417e54f04fcc5ffb2 X-Virus-Checked: Checked by ClamAV on apache.org --001a1135ea9417e54f04fcc5ffb2 Content-Type: text/plain; charset=UTF-8 And if you disable forking completely, do the tests pass for you always, or they also fail intermittently? 2014-06-26 15:59 GMT-07:00 Andrew Purtell : > Additionally we run unit tests in parallel to reduce the total time > required for test suite execution. Surefire will fork multiple JVMs, > dynamically generate test jars containing a subset of tests, and run them. > That can make isolating hanging tests difficult but this behavior can be > influenced by defines on the Maven command line. For example, to fork a > process for every single unit test: > > mvn test -Dsurefire.firstPartForkMode=always > -Dsurefire.secondPartForkMode=always > > And then if you find a hanging surefire runner, you can dump thread stacks > of that JVM and know only the unit test you find methods of in the stacks > contributed to the current wedged state. > > > On Thu, Jun 26, 2014 at 3:48 PM, Andrew Purtell > wrote: > > > Java 7u60 64-bit on an EC2 m3.4xlarge. Just running the unit test suite > in > > a loop. I don't set any special Maven options in MVN_OPTS or anything > like > > that. > > > > Historically failures that occur when the suite executes but do not when > > individual tests pass happen because one test does not shut down in a > > timely manner, or at all, and a subsequent test might use the same > > hardcoded path or port. When that happens we have a sporadic and > sometimes > > load sensitive failure. Complicating, each time one clones a repository > on > > a different host or file filesystem JUnit may pick up a different test > > order, influenced by whatever readdir hands back for each package. > > > > > > > > > > On Thu, Jun 26, 2014 at 3:25 PM, Mikhail Antonov > > wrote: > > > >> Andrew, > >> > >> Could you share some details - on what env. you're running the tests, > and > >> at which point do that fail? I'm curious because of lately I'm seeing > >> weird > >> failures on current master too, which do not happen on hadoop-qa - > >> individual tests always pass, but when running the suite tests either > get > >> stuck and time out (in roughly the same point), or fail with NPE or > >> PermGen > >> exception. I've been blaming my environment first, but may be it's > >> something related. > >> > >> -Mikhail > >> > >> > >> > >> > >> 2014-06-26 13:39 GMT-07:00 Andrew Purtell : > >> > >> > I'm finding that repeated runs of the unit test suite at the head of > >> branch > >> > 0.98 intermittently fail. Individual tests do not, so this likely a > >> lagging > >> > shutdown, port/resource conflict, and/or zombie test issue. I am > >> currently > >> > bisecting commits on 0.98 branch since the last release in the hope of > >> > pinning this down to a single change. Depending on how quickly that > can > >> > happen, the RC might happen on Monday or not. As things stand at the > >> head > >> > of the branch, I'd not +1 the RC given the release criteria I've been > >> using > >> > up to now. > >> > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > -- Thanks, Michael Antonov --001a1135ea9417e54f04fcc5ffb2--