drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: TestDrillbitResilience broken? assertion errors; now slow/hung, with 278 threads!
Date Wed, 29 Apr 2015 16:28:24 GMT
On Wed, Apr 29, 2015 at 9:15 AM, Jacques Nadeau <jacques@apache.org> wrote:

> Quick question re 10 runs: are these runs that are in parallel with all the
> unit tests or just this test?
>
> The other question is: how do we construct these tests so they it is
> extremely unlikely to get a failure even if processing is slow or threads
> are suspended?
>

First problems we hit when processing is slow are junit timeouts. Once a
unit tests times out, it's corresponding query isn't cancelled and may
continue running in parallel with other unit tests from same test class.
Once the @AfterClass method shuts down the drillbits, they may complain
about allocators not closed because some queries are actually still running.


> On Wed, Apr 29, 2015 at 7:53 AM, Sudheesh Katkam <skatkam@maprtech.com>
> wrote:
>
> > I am responsible for those tests. I ran the tests at least 10 times on my
> > Linux VM with 1 second pauses, all of which passed.
> >
> > On your second run, what different errors did you see?
> >
> > On your third run, are you able to reproduce the test case the hangs?
> >
> > Sorry that the message is not informative. I already have a patch which
> is
> > a slight improvement to Jacques change that improves the message in those
> > tests.
> >
> > What tool did you use to get the thread count?
> >
> > - Sudheesh
> >
> > Sent from my iPhone. Pardon any typos.
> >
> > > On Apr 29, 2015, at 6:28 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com>
> > wrote:
> > >
> > > The message displayed in the first run contains actually two different
> > > issues:
> > >
> > > 1. The error message "Error shutting down Drillbit 'beta'" is most
> likely
> > > caused by this issue DRILL-2878
> > > <https://issues.apache.org/jira/browse/DRILL-2878>
> > >
> > > 2. The test that failed with an "java.lang.AssertionError: null" is
> most
> > > likely a bug because that unit test should not fail. I've seen this
> error
> > > before, but it only happens intermittently.
> > >
> > > The system error reported in the 3rd run is actually an "expected"
> > injected
> > > exception, but 278 threads looks suspicious!!!
> > >
> > > On Wed, Apr 29, 2015 at 12:13 AM, Daniel Barclay <
> dbarclay@maprtech.com>
> > > wrote:
> > >
> > >> Does anyone know what's going on with TestDrillbitResilience (rebased
> > >> from master today)?  (Is it working right?)
> > >>
> > >>
> > >> One run, via "mvn install", yielded assertion errors:
> > >>
> > >> ...
> > >> Error shutting down Drillbit "beta".
> > >> Tests run: 11, Failures: 2, Errors: 0, Skipped: 0, Time elapsed:
> 33.811
> > >> sec <<< FAILURE! - in
> > org.apache.drill.exec.server.TestDrillbitResilience
> > >>
> >
> cancelAfterEverythingIsCompleted(org.apache.drill.exec.server.TestDrillbitResilience)
> > >> Time elapsed: 1.468 sec  <<< FAILURE!
> > >> java.lang.AssertionError: null
> > >>        at
> > >>
> >
> org.apache.drill.exec.server.TestDrillbitResilience.assertCancelled(TestDrillbitResilience.java:459)
> > >>        at
> > >>
> >
> org.apache.drill.exec.server.TestDrillbitResilience.cancelAfterEverythingIsCompleted(TestDrillbitResilience.java:565)
> > >>
> > >>
> >
> cancelInMiddleOfFetchingResults(org.apache.drill.exec.server.TestDrillbitResilience)
> > >> Time elapsed: 1.496 sec  <<< FAILURE!
> > >> java.lang.AssertionError: null
> > >>        at
> > >>
> >
> org.apache.drill.exec.server.TestDrillbitResilience.assertCancelled(TestDrillbitResilience.java:459)
> > >>        at
> > >>
> >
> org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetchingResults(TestDrillbitResilience.java:510)
> > >>
> > >> Running <next test>
> > >> ...
> > >>
> > >>
> > >> A second run, run individually (but still via Maven) died with
> different
> > >> errors.
> > >>
> > >>
> > >>
> > >> A third run, via "mvn install" again, seems hung after reporting this
> > >> (maybe expected) exception:
> > >>
> > >> Exception (no rows returned):
> > >> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> > >> run-try-end
> > >>
> > >>
> > >> [fb9cfe61-af6e-4c9c-b6ab-8a1b8725c6e9 on dev-linux2:31010]
> > >>
> > >>
> > >> The process is using only about 5% CPU--but has 278 threads!
> > >> (That includes about 35 threads all with the same name of
> > "BitClient-1".)
> > >>
> > >>
> > >> Daniel
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Daniel Barclay
> > >> MapR Technologies
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >  <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message