drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: [VOTE] Release Apache Drill 1.3.0 (rc0)
Date Sat, 07 Nov 2015 02:44:20 GMT
I see that we're bleeding Workmanager Status threads that aren't shutdown
when the Drillbit is shutdown.

I'll get a patch together.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Nov 6, 2015 at 4:31 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:

> Looks like we are possibly leaking some threads. Investigating.
>
> On Fri, Nov 6, 2015 at 4:25 PM, Jacques Nadeau <jacques@dremio.com> wrote:
>
> > Hmm.. that is quite strange. I wonder if we need to look at thread counts
> > on the daemon.
> >
> > We haven't changed how we create but there were changes to shutdown
> > (although I can't imagine why that would be a problem).
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Nov 6, 2015 at 4:11 PM, Hanifi Gunes <hgunes@maprtech.com>
> wrote:
> >
> > > Not the testAggregateWithEmptyRequiredInput but I got the following on
> > > my branch rebased top of master -- @CentOS.
> > >
> > > Tests in error:
> > >   TestImpersonationQueries.sequenceFileChainedImpersonationWithView »
> > > UserRemote
> > >
> > >
> >
> TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery.
> > >    updateClient:236->BaseTestQuery.updateClient:213 » Rpc
> > >
> > >
> >
> TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> > >   236->BaseTestQuery.updateClient:213 » IllegalState
> > >
> > >
> >
> TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222-
> > >      >BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
»
> > > IllegalState
> > >
> > >
> >
> TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> > > 236->BaseTestQuery.updateClient:213 » IllegalState
> > >
> > >
> >
> TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236-
> > >   >BaseTestQuery.updateClient:213 » IllegalState
> > >
> > >
> >
> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> > >   236->BaseTestQuery.updateClient:213 » IllegalState
> > >
> > > exception details --->
> > >
> > >
> > >
> >
> testMultiLevelImpersonationExceedsMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries)
> > >  Time elapsed: 0.008 sec  <<<   ERROR!
> > > java.lang.IllegalStateException: failed to create a child event loop
> > >   at sun.nio.ch.IOUtil.makePipe(Native Method)
> > >   at
> > io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)
> > >   at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:120)
> > >   at
> > >
> >
> io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
> > >   at
> > >
> >
> io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64)
> > >   at
> > >
> >
> io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49)
> > >   at
> > >
> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:61)
> > >   at
> > >
> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:52)
> > >   at
> > >
> >
> org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:74)
> > >   at
> > >
> >
> org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239)
> > >   at
> > org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220)
> > >   at
> > org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178)
> > >   at org.apache.drill.QueryTestUtil.createClient(QueryTestUtil.java:67)
> > >   at
> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:213)
> > >   at
> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:236)
> > >
> > >
> > > My god's telling me that we are creating too many NioEventLoopGroup's.
> > > Did we make any recent changes around RPC causing this?
> > >
> > > -Hanifi
> > >
> > >
> > > On Fri, Nov 6, 2015 at 3:58 PM, Jacques Nadeau <jacques@dremio.com>
> > wrote:
> > >
> > > > Do you have that other output/stack trace I asked about? If we can
> also
> > > see
> > > > the illegalreference count on something other than the JDBC client
> > close
> > > > method, that would be helpful.
> > > >
> > > > --
> > > > Jacques Nadeau
> > > > CTO and Co-Founder, Dremio
> > > >
> > > > On Fri, Nov 6, 2015 at 2:48 PM, Jinfeng Ni <jinfengni99@gmail.com>
> > > wrote:
> > > >
> > > > > I just re-run, and the previous 4 failures are gone. But it failed
> > > > > with two new ones:
> > > > >
> > > > > Tests in error:
> > > > >
> > > > >
> > > >
> > >
> >
> TestSqlStdBasedAuthorization.org.apache.drill.exec.impersonation.hive.TestSqlStdBasedAuthorization
> > > > > » UserRemote
> > > > >
> > > > >
> > > >
> > >
> >
> TestStorageBasedHiveAuthorization.org.apache.drill.exec.impersonation.hive.TestStorageBasedHiveAuthorization
> > > > > » UserRemote
> > > > >
> > > > > I re-start the machine, and there are not too many applications
> > > > > running and the memory should be enough.  At least some days back,
> I
> > > > > got clean run on the same machine.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Nov 6, 2015 at 2:39 PM, Jacques Nadeau <jacques@dremio.com
> >
> > > > wrote:
> > > > > > Can you provide the complete output for this failure:
> > > > > >
> > > > > > TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237
»
> > > > > > IllegalReferenceCount
> > > > > >
> > > > > > I haven't seen the other issues. The last one looks like the
> system
> > > was
> > > > > > having an issue since thread creation failure is usually an
OS
> > > problem.
> > > > > Was
> > > > > > your system under resourced?
> > > > > >
> > > > > > --
> > > > > > Jacques Nadeau
> > > > > > CTO and Co-Founder, Dremio
> > > > > >
> > > > > > On Fri, Nov 6, 2015 at 12:55 PM, Jinfeng Ni <
> jinfengni99@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> I'm seeing unit test case failure when run "mvn clean install"
> > over
> > > > > >> drill master branch, on Mac.
> > > > > >>
> > > > > >> The first one seems to be the issue #3 in Jacques's list.
The
> last
> > > > > >> three seems to different from the 4 issues. Has anyone seen
this
> > > > > >> failure before, or it just happened to my mac? Thanks.
> > > > > >>
> > > > > >>
> > > > > >> =================================================
> > > > > >> git log
> > > > > >> commit 1a24233475ca46aaf2a49a5624b4042f088382f4
> > > > > >>
> > > > > >>
> > > > > >> Tests in error:
> > > > > >>
>  TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 »
> > > > > >> IllegalReferenceCount
> > > > > >>
> > > >
> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops
> > > > > >> » UserRemote
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> TestImpersonationQueries.removeMiniDfsBasedStorage:294->BaseTestImpersonation.stopMiniDfsCluster:151
> > > > > >> » OutOfMemory
> > > > > >>   TestImpersonationQueries>BaseTestQuery.closeClient:260
»
> > > OutOfMemory
> > > > > >> unable to...
> > > > > >>
> > > > > >> Tests run: 1483, Failures: 0, Errors: 4, Skipped: 118
> > > > > >>
> > > > > >> [INFO]
> > > > > >>
> > > >
> > ------------------------------------------------------------------------
> > > > > >> [INFO] Reactor Summary:
> > > > > >> [INFO]
> > > > > >> [INFO] Apache Drill Root POM ..............................
> > SUCCESS
> > > [
> > > > > >> 8.440 s]
> > > > > >> [INFO] tools/Parent Pom ...................................
> > SUCCESS
> > > [
> > > > > >> 0.631 s]
> > > > > >> [INFO] tools/freemarker codegen tooling ...................
> > SUCCESS
> > > [
> > > > > >> 5.236 s]
> > > > > >> [INFO] Drill Protocol .....................................
> > SUCCESS
> > > [
> > > > > >> 5.839 s]
> > > > > >> [INFO] Common (Logical Plan, Base expressions) ............
> > SUCCESS
> > > [
> > > > > >> 10.831 s]
> > > > > >> [INFO] contrib/Parent Pom .................................
> > SUCCESS
> > > [
> > > > > >> 0.815 s]
> > > > > >> [INFO] contrib/data/Parent Pom ............................
> > SUCCESS
> > > [
> > > > > >> 0.331 s]
> > > > > >> [INFO] contrib/data/tpch-sample-data ......................
> > SUCCESS
> > > [
> > > > > >> 2.838 s]
> > > > > >> [INFO] exec/Parent Pom ....................................
> > SUCCESS
> > > [
> > > > > >> 0.635 s]
> > > > > >> [INFO] exec/Java Execution Engine .........................
> > FAILURE
> > > > > [12:05
> > > > > >> min]
> > > > > >> [INFO] exec/JDBC Driver using dependencies ................
> > SKIPPED
> > > > > >> [INFO] JDBC JAR with all dependencies .....................
> > SKIPPED
> > > > > >> [INFO] contrib/mongo-storage-plugin .......................
> > SKIPPED
> > > > > >>
> > > > > >> Tests run: 11, Failures: 0, Errors: 3, Skipped: 0, Time
elapsed:
> > > > > >> 17.042 sec <<< FAILURE! - in
> > > > > >> org.apache.drill.exec.impersonation.TestImpersonationQueries
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> testMultiLevelImpersonationEqualToMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries)
> > > > > >>  Time elapsed: 0.099 sec  <<< ERROR!
> > > > > >> org.apache.drill.common.exceptions.UserRemoteException:
SYSTEM
> > > ERROR:
> > > > > >> OutOfMemoryError: unable to create new native thread
> > > > > >>
> > > > > >>
> > > > > >> [Error Id: a826ac5d-e278-49bc-8f92-fdf241d0e634 on
> > > 10.250.50.52:31010
> > > > ]
> > > > > >>   at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
> > > > > >>   at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112)
> > > > > >>   at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
> > > > > >>   at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
> > > > > >>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:68)
> > > > > >>   at
> > > > org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:390)
> > > > > >>   at
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
> > > > > >>   at
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > > >>   at
> > > > > >>
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > >>   at java.lang.Thread.run(Thread.java:744)
> > > > > >>
> > > > > >> On Fri, Nov 6, 2015 at 9:42 AM, Jacques Nadeau <
> > jacques@dremio.com>
> > > > > wrote:
> > > > > >> > It seems like we have four potentially show stopping
issues at
> > the
> > > > > >> moment:
> > > > > >> >
> > > > > >> > DRILL-4042: Windows build doesn't include right version
of
> > Hadoop
> > > > > >> > dependencies
> > > > > >> > DRILL-3480: Random message propagation timeouts
> > > > > >> > DRILL-4041: Reference count issue
> > > > > >> > DRILL-4046: Performance regression for some TPCH queries
> > > > > >> >
> > > > > >> > Proposed next steps:
> > > > > >> >
> > > > > >> > DRILL-4042 has a clear fix and reproduction. Patrick,
do you
> > think
> > > > can
> > > > > >> have
> > > > > >> > a fix up for this shortly?
> > > > > >> >
> > > > > >> > For the 3480 & 4041, consistent reproductions are
missing. It
> > > would
> > > > be
> > > > > >> > great if everybody could try to help find reproductions
to
> these
> > > > > issues.
> > > > > >> I
> > > > > >> > think we should take stock again at the end of the
day to
> decide
> > > > next
> > > > > >> steps
> > > > > >> > and whether we want to hold the release for these.
> > > > > >> >
> > > > > >> > For 4046: I've heard that there are some performance
> regressions
> > > > > around a
> > > > > >> > couple of queries but the current symptoms don't make
a lot of
> > > > sense.
> > > > > I'd
> > > > > >> > like to collect some more data here and then decide
next
> steps.
> > > > > >> >
> > > > > >> > Let's see if we can get repros for each of the inconsistent
> > issues
> > > > and
> > > > > >> > check in again EOD.
> > > > > >> >
> > > > > >> > thanks,
> > > > > >> > Jacques
> > > > > >> >
> > > > > >> > --
> > > > > >> > Jacques Nadeau
> > > > > >> > CTO and Co-Founder, Dremio
> > > > > >> >
> > > > > >> > On Thu, Nov 5, 2015 at 3:36 PM, Aditya <
> adityakishore@gmail.com
> > >
> > > > > wrote:
> > > > > >> >
> > > > > >> >> Ran into another one - DRILL-4042
> > > > > >> >> <https://issues.apache.org/jira/browse/DRILL-4042>.
> > > > > >> >>
> > > > > >> >> On Thu, Nov 5, 2015 at 1:48 PM, Jacques Nadeau
<
> > > jacques@dremio.com
> > > > >
> > > > > >> wrote:
> > > > > >> >>
> > > > > >> >>> Yeah, I think that sinks it. Weird how Rat
complains only on
> > > > > windows...
> > > > > >> >>>
> > > > > >> >>> Let's take the rest of the business day to
test the current
> > > > > candidate
> > > > > >> to
> > > > > >> >>> make sure that we don't spin extra builds unnecessarily.
> > > > > >> >>>
> > > > > >> >>> thanks,
> > > > > >> >>> Jacques
> > > > > >> >>>
> > > > > >> >>>
> > > > > >> >>> --
> > > > > >> >>> Jacques Nadeau
> > > > > >> >>> CTO and Co-Founder, Dremio
> > > > > >> >>>
> > > > > >> >>> On Thu, Nov 5, 2015 at 1:24 PM, Aditya <
> > adityakishore@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> >>>
> > > > > >> >>>> Oh, I thought only master/trunk branch
was protected, but
> > now I
> > > > see
> > > > > >> the
> > > > > >> >>>> mail from David Nalley.
> > > > > >> >>>>
> > > > > >> >>>> In such case, I propose that the release
manager could push
> > the
> > > > > branch
> > > > > >> >>>> to his/her private fork and put the URL/hash
in the vote
> > > starter
> > > > > >> thread.
> > > > > >> >>>>
> > > > > >> >>>> The reason I was looking to the commit
history to determine
> > if
> > > > the
> > > > > >> >>>> candidate suffer from DRILL-4040, which,
evidently it does.
> > > > > >> >>>>
> > > > > >> >>>> -1 as the build from source is failing.
> > > > > >> >>>>
> > > > > >> >>>> [1] https://issues.apache.org/jira/browse/DRILL-4040
> > > > > >> >>>>
> > > > > >> >>>> On Thu, Nov 5, 2015 at 1:12 PM, Jacques
Nadeau <
> > > > jacques@dremio.com
> > > > > >
> > > > > >> >>>> wrote:
> > > > > >> >>>>
> > > > > >> >>>>> I'm not sure what to do here. INFRA
just changed the Git
> > > > behavior
> > > > > so
> > > > > >> it
> > > > > >> >>>>> is no longer possible to delete branches.
I generally
> don't
> > > like
> > > > > to
> > > > > >> have
> > > > > >> >>>>> failed branches in a release history
(otherwise you get a
> > > > release
> > > > > >> branch
> > > > > >> >>>>> with all these maven forward/backwards
commits). As such,
> I
> > > > would
> > > > > >> overwrite
> > > > > >> >>>>> candidate branches historically (dropping
the failed
> release
> > > > > >> commits).
> > > > > >> >>>>>
> > > > > >> >>>>> The commit is here right now:
> > > > > >> >>>>> https://github.com/jacques-n/drill/tree/drill-1.3.0-rc0
> > > > > >> >>>>>
> > > > > >> >>>>> The parent of 4822068a006aeb251b686d2b51871573c4337e60
> > > > > >> >>>>> is
> > > > > >> >>>>> 3dedc158f3af8ec8320a9cd336b2798b09cc9a8d
(the tip of
> master)
> > > > > >> >>>>>
> > > > > >> >>>>>
> > > > > >> >>>>>
> > > > > >> >>>>> --
> > > > > >> >>>>> Jacques Nadeau
> > > > > >> >>>>> CTO and Co-Founder, Dremio
> > > > > >> >>>>>
> > > > > >> >>>>> On Thu, Nov 5, 2015 at 1:01 PM, Aditya
<
> > > adityakishore@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >> >>>>>
> > > > > >> >>>>>> I am having trouble determining
the git commit this
> release
> > > is
> > > > > based
> > > > > >> >>>>>> on as
> > > > > >> >>>>>> I could not find the
> > > > > >> >>>>>> id (4822068a006aeb251b686d2b51871573c4337e60)
captured in
> > the
> > > > > >> >>>>>> git.properties bundled in the
> > > > > >> >>>>>> tarballs in the Drill Git repository.
> > > > > >> >>>>>>
> > > > > >> >>>>>> Most likely the last commit is
only in your local branch
> > and
> > > > > since
> > > > > >> >>>>>> git.properties captures only the
> > > > > >> >>>>>> last commit, it is impossible to
find the parent commit.
> > > > > >> >>>>>>
> > > > > >> >>>>>> Would it make sense to push the
release branch?
> > > > > >> >>>>>>
> > > > > >> >>>>>> aditya...
> > > > > >> >>>>>>
> > > > > >> >>>>>> On Wed, Nov 4, 2015 at 11:08 PM,
Jacques Nadeau <
> > > > > jacques@dremio.com
> > > > > >> >
> > > > > >> >>>>>> wrote:
> > > > > >> >>>>>>
> > > > > >> >>>>>> > Hey Everybody,
> > > > > >> >>>>>> >
> > > > > >> >>>>>> > I'm happy to propose a new
release of Apache Drill,
> > version
> > > > > 1.3.0.
> > > > > >> >>>>>> This is
> > > > > >> >>>>>> > the first release candidate
(rc0).  It covers a total
> of
> > > ~50
> > > > > >> closed
> > > > > >> >>>>>> JIRAs
> > > > > >> >>>>>> > [1].
> > > > > >> >>>>>> >
> > > > > >> >>>>>> > The tarball artifacts are
hosted at [2] and the maven
> > > > artifacts
> > > > > >> are
> > > > > >> >>>>>> hosted
> > > > > >> >>>>>> > at [3].
> > > > > >> >>>>>> >
> > > > > >> >>>>>> > The vote will be open for
72 hours ending at 11PM
> > Pacific,
> > > > > >> November
> > > > > >> >>>>>> 7,
> > > > > >> >>>>>> > 2015.
> > > > > >> >>>>>> >
> > > > > >> >>>>>> > [ ] +1
> > > > > >> >>>>>> > [ ] +0
> > > > > >> >>>>>> > [ ] -1
> > > > > >> >>>>>> >
> > > > > >> >>>>>> > thanks,
> > > > > >> >>>>>> > Jacques
> > > > > >> >>>>>> >
> > > > > >> >>>>>> > [1]
> > > > > >> >>>>>> >
> > > > > >> >>>>>> >
> > > > > >> >>>>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946
> > > > > >> >>>>>> > [2]
> > > > http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/
> > > > > >> >>>>>> > [3]
> > > > > >> >>>>>> >
> > > > > >> >>>>>>
> > > > > >>
> > > >
> > https://repository.apache.org/content/repositories/orgapachedrill-1013/
> > > > > >> >>>>>> >
> > > > > >> >>>>>>
> > > > > >> >>>>>
> > > > > >> >>>>>
> > > > > >> >>>>
> > > > > >> >>>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message