Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D84FA17563 for ; Wed, 29 Apr 2015 15:07:40 +0000 (UTC) Received: (qmail 60630 invoked by uid 500); 29 Apr 2015 15:07:37 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 60580 invoked by uid 500); 29 Apr 2015 15:07:37 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 60569 invoked by uid 99); 29 Apr 2015 15:07:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 15:07:37 +0000 X-ASF-Spam-Status: No, hits=0.3 required=5.0 tests=PLING_QUERY,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which is an MX secondary for dev@drill.apache.org) Received: from [54.76.25.247] (HELO mx1-eu-west.apache.org) (54.76.25.247) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 15:07:11 +0000 Received: from mail-pa0-f51.google.com (mail-pa0-f51.google.com [209.85.220.51]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 27DDE2AAFD for ; Wed, 29 Apr 2015 15:07:10 +0000 (UTC) Received: by pacyx8 with SMTP id yx8so30401566pac.1 for ; Wed, 29 Apr 2015 08:07:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:content-transfer-encoding :mime-version:subject:message-id:date:references:in-reply-to:to; bh=FWhQPXR7X2atqWlYy3LkTvHFzqKlB17sJ9DWrfiORPg=; b=eH5Sj27mIWXxta0VS/xdVZkjl7VA8+z4VHjgsoBhVh0LgBi5GNqz1GC27e9o/IS59g P133vF5N786Xvebat9e31xBUmOQZz8riZlI/ZPQNpGn0JlVMOWZC4dOOPrdExn71Bbzm mplu017Lh20uzOJm44ZQEHFmwaAt/l/0w6ucC6uyAjaON+t/4tE+0UUhdBTWGJq7Fqi3 kggmJBh8YWA85bF3oGE55a9QsWee/wGb8Fz7/by1U4tqkIQkf9UWfQV2ftf9TlJVub9O 7+b34Pw3fe12OwYQW34+xQb26swh3Yo/OZrn/mVwD3T8sdEPNHsGo1cIocBQnkIBitTL zSRA== X-Gm-Message-State: ALoCoQk4mOGE1mcCUEynhB+NgzT06hCEB+iiS9MGYxWBuUEsAu0C43GkOfGt/GwI3SdCa7vaL/ZZ X-Received: by 10.66.183.47 with SMTP id ej15mr42945160pac.34.1430320028803; Wed, 29 Apr 2015 08:07:08 -0700 (PDT) Received: from ?IPv6:2602:30a:2e9c:eb90:8d59:d5ef:2286:2dc? ([2602:30a:2e9c:eb90:8d59:d5ef:2286:2dc]) by mx.google.com with ESMTPSA id bz11sm25767065pdb.34.2015.04.29.08.07.06 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 29 Apr 2015 08:07:07 -0700 (PDT) From: Sudheesh Katkam Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) Subject: Re: TestDrillbitResilience broken? assertion errors; now slow/hung, with 278 threads! Message-Id: <921C02F9-9733-4585-9290-AAE798797F67@maprtech.com> Date: Wed, 29 Apr 2015 08:07:05 -0700 References: <55408487.7050106@maprtech.com> <269638FA-9B80-4F81-A0AA-A8C48B32BFDA@maprtech.com> In-Reply-To: <269638FA-9B80-4F81-A0AA-A8C48B32BFDA@maprtech.com> To: "dev@drill.apache.org" X-Mailer: iPhone Mail (12D508) X-Virus-Checked: Checked by ClamAV on apache.org *ran the tests before checking them in.=20 > On Apr 29, 2015, at 7:53 AM, Sudheesh Katkam wrote:= >=20 > I am responsible for those tests. I ran the tests at least 10 times on my L= inux VM with 1 second pauses, all of which passed.=20 >=20 > On your second run, what different errors did you see? >=20 > On your third run, are you able to reproduce the test case the hangs? >=20 > Sorry that the message is not informative. I already have a patch which is= a slight improvement to Jacques change that improves the message in those t= ests. =20 >=20 > What tool did you use to get the thread count? >=20 > - Sudheesh >=20 > Sent from my iPhone. Pardon any typos. >=20 >> On Apr 29, 2015, at 6:28 AM, Abdel Hakim Deneche w= rote: >>=20 >> The message displayed in the first run contains actually two different >> issues: >>=20 >> 1. The error message "Error shutting down Drillbit 'beta'" is most likely= >> caused by this issue DRILL-2878 >> >>=20 >> 2. The test that failed with an "java.lang.AssertionError: null" is most >> likely a bug because that unit test should not fail. I've seen this error= >> before, but it only happens intermittently. >>=20 >> The system error reported in the 3rd run is actually an "expected" inject= ed >> exception, but 278 threads looks suspicious!!! >>=20 >> On Wed, Apr 29, 2015 at 12:13 AM, Daniel Barclay >> wrote: >>=20 >>> Does anyone know what's going on with TestDrillbitResilience (rebased >>> from master today)? (Is it working right?) >>>=20 >>>=20 >>> One run, via "mvn install", yielded assertion errors: >>>=20 >>> ... >>> Error shutting down Drillbit "beta". >>> Tests run: 11, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 33.811 >>> sec <<< FAILURE! - in org.apache.drill.exec.server.TestDrillbitResilienc= e >>> cancelAfterEverythingIsCompleted(org.apache.drill.exec.server.TestDrillb= itResilience) >>> Time elapsed: 1.468 sec <<< FAILURE! >>> java.lang.AssertionError: null >>> at >>> org.apache.drill.exec.server.TestDrillbitResilience.assertCancelled(Test= DrillbitResilience.java:459) >>> at >>> org.apache.drill.exec.server.TestDrillbitResilience.cancelAfterEverythin= gIsCompleted(TestDrillbitResilience.java:565) >>>=20 >>> cancelInMiddleOfFetchingResults(org.apache.drill.exec.server.TestDrillbi= tResilience) >>> Time elapsed: 1.496 sec <<< FAILURE! >>> java.lang.AssertionError: null >>> at >>> org.apache.drill.exec.server.TestDrillbitResilience.assertCancelled(Test= DrillbitResilience.java:459) >>> at >>> org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetc= hingResults(TestDrillbitResilience.java:510) >>>=20 >>> Running >>> ... >>>=20 >>>=20 >>> A second run, run individually (but still via Maven) died with different= >>> errors. >>>=20 >>>=20 >>>=20 >>> A third run, via "mvn install" again, seems hung after reporting this >>> (maybe expected) exception: >>>=20 >>> Exception (no rows returned): >>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: >>> run-try-end >>>=20 >>>=20 >>> [fb9cfe61-af6e-4c9c-b6ab-8a1b8725c6e9 on dev-linux2:31010] >>>=20 >>>=20 >>> The process is using only about 5% CPU--but has 278 threads! >>> (That includes about 35 threads all with the same name of "BitClient-1".= ) >>>=20 >>>=20 >>> Daniel >>>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>>=20 >>> -- >>> Daniel Barclay >>> MapR Technologies >>=20 >>=20 >>=20 >> --=20 >>=20 >> Abdelhakim Deneche >>=20 >> Software Engineer >>=20 >> >>=20 >>=20 >> Now Available - Free Hadoop On-Demand Training >>