drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: [VOTE] Release Apache Drill 1.3.0 (rc0)
Date Sat, 07 Nov 2015 02:59:25 GMT
Ok, display issue. I lied.

It looks like this is constrained to the status thread. However, we're
probably creating several hundred over the course of the tests (since we
don't restart the jvm).

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Fri, Nov 6, 2015 at 6:56 PM, Sudheesh Katkam <skatkam@maprtech.com>
wrote:

> But the status thread is a daemon. So the Drillbit doesn't have to stop
> it, right?
>
> - Sudheesh
>
> > On Nov 6, 2015, at 6:44 PM, Jacques Nadeau <jacques@dremio.com> wrote:
> >
> > I see that we're bleeding Workmanager Status threads that aren't shutdown
> > when the Drillbit is shutdown.
> >
> > I'll get a patch together.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> >> On Fri, Nov 6, 2015 at 4:31 PM, Hanifi Gunes <hgunes@maprtech.com>
> wrote:
> >>
> >> Looks like we are possibly leaking some threads. Investigating.
> >>
> >>> On Fri, Nov 6, 2015 at 4:25 PM, Jacques Nadeau <jacques@dremio.com>
> wrote:
> >>>
> >>> Hmm.. that is quite strange. I wonder if we need to look at thread
> counts
> >>> on the daemon.
> >>>
> >>> We haven't changed how we create but there were changes to shutdown
> >>> (although I can't imagine why that would be a problem).
> >>>
> >>> --
> >>> Jacques Nadeau
> >>> CTO and Co-Founder, Dremio
> >>>
> >>>> On Fri, Nov 6, 2015 at 4:11 PM, Hanifi Gunes <hgunes@maprtech.com>
> >>> wrote:
> >>>
> >>>> Not the testAggregateWithEmptyRequiredInput but I got the following
on
> >>>> my branch rebased top of master -- @CentOS.
> >>>>
> >>>> Tests in error:
> >>>>  TestImpersonationQueries.sequenceFileChainedImpersonationWithView »
> >>>> UserRemote
> >>
> TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery.
> >>>>   updateClient:236->BaseTestQuery.updateClient:213 » Rpc
> >>
> TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> >>>>  236->BaseTestQuery.updateClient:213 » IllegalState
> >>
> TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222-
> >>>>> BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213
»
> >>>> IllegalState
> >>
> TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> >>>> 236->BaseTestQuery.updateClient:213 » IllegalState
> >>
> TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236-
> >>>>> BaseTestQuery.updateClient:213 » IllegalState
> >>
> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:
> >>>>  236->BaseTestQuery.updateClient:213 » IllegalState
> >>>>
> >>>> exception details --->
> >>
> testMultiLevelImpersonationExceedsMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries)
> >>>> Time elapsed: 0.008 sec  <<<   ERROR!
> >>>> java.lang.IllegalStateException: failed to create a child event loop
> >>>>  at sun.nio.ch.IOUtil.makePipe(Native Method)
> >>>>  at
> >>> io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126)
> >>>>  at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:120)
> >>>>  at
> >>
> io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
> >>>>  at
> >>
> io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64)
> >>>>  at
> >>
> io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49)
> >>>>  at
> >> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:61)
> >>>>  at
> >> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:52)
> >>>>  at
> >>
> org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:74)
> >>>>  at
> >>
> org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239)
> >>>>  at
> >>> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220)
> >>>>  at
> >>> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178)
> >>>>  at org.apache.drill.QueryTestUtil.createClient(QueryTestUtil.java:67)
> >>>>  at
> >> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:213)
> >>>>  at
> >> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:236)
> >>>>
> >>>>
> >>>> My god's telling me that we are creating too many NioEventLoopGroup's.
> >>>> Did we make any recent changes around RPC causing this?
> >>>>
> >>>> -Hanifi
> >>>>
> >>>>
> >>>>> On Fri, Nov 6, 2015 at 3:58 PM, Jacques Nadeau <jacques@dremio.com>
> >>>> wrote:
> >>>>
> >>>>> Do you have that other output/stack trace I asked about? If we can
> >> also
> >>>> see
> >>>>> the illegalreference count on something other than the JDBC client
> >>> close
> >>>>> method, that would be helpful.
> >>>>>
> >>>>> --
> >>>>> Jacques Nadeau
> >>>>> CTO and Co-Founder, Dremio
> >>>>>
> >>>>>> On Fri, Nov 6, 2015 at 2:48 PM, Jinfeng Ni <jinfengni99@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> I just re-run, and the previous 4 failures are gone. But it
failed
> >>>>>> with two new ones:
> >>>>>>
> >>>>>> Tests in error:
> >>
> TestSqlStdBasedAuthorization.org.apache.drill.exec.impersonation.hive.TestSqlStdBasedAuthorization
> >>>>>> » UserRemote
> >>
> TestStorageBasedHiveAuthorization.org.apache.drill.exec.impersonation.hive.TestStorageBasedHiveAuthorization
> >>>>>> » UserRemote
> >>>>>>
> >>>>>> I re-start the machine, and there are not too many applications
> >>>>>> running and the memory should be enough.  At least some days
back,
> >> I
> >>>>>> got clean run on the same machine.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 6, 2015 at 2:39 PM, Jacques Nadeau <jacques@dremio.com
> >>>
> >>>>> wrote:
> >>>>>>> Can you provide the complete output for this failure:
> >>>>>>>
> >>>>>>> TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237
»
> >>>>>>> IllegalReferenceCount
> >>>>>>>
> >>>>>>> I haven't seen the other issues. The last one looks like
the
> >> system
> >>>> was
> >>>>>>> having an issue since thread creation failure is usually
an OS
> >>>> problem.
> >>>>>> Was
> >>>>>>> your system under resourced?
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jacques Nadeau
> >>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>
> >>>>>>> On Fri, Nov 6, 2015 at 12:55 PM, Jinfeng Ni <
> >> jinfengni99@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> I'm seeing unit test case failure when run "mvn clean
install"
> >>> over
> >>>>>>>> drill master branch, on Mac.
> >>>>>>>>
> >>>>>>>> The first one seems to be the issue #3 in Jacques's
list. The
> >> last
> >>>>>>>> three seems to different from the 4 issues. Has anyone
seen this
> >>>>>>>> failure before, or it just happened to my mac? Thanks.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> =================================================
> >>>>>>>> git log
> >>>>>>>> commit 1a24233475ca46aaf2a49a5624b4042f088382f4
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Tests in error:
> >> TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 »
> >>>>>>>> IllegalReferenceCount
> >> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops
> >>>>>>>> » UserRemote
> >>
> TestImpersonationQueries.removeMiniDfsBasedStorage:294->BaseTestImpersonation.stopMiniDfsCluster:151
> >>>>>>>> » OutOfMemory
> >>>>>>>>  TestImpersonationQueries>BaseTestQuery.closeClient:260
»
> >>>> OutOfMemory
> >>>>>>>> unable to...
> >>>>>>>>
> >>>>>>>> Tests run: 1483, Failures: 0, Errors: 4, Skipped: 118
> >>>>>>>>
> >>>>>>>> [INFO]
> >>>
> ------------------------------------------------------------------------
> >>>>>>>> [INFO] Reactor Summary:
> >>>>>>>> [INFO]
> >>>>>>>> [INFO] Apache Drill Root POM ..............................
> >>> SUCCESS
> >>>> [
> >>>>>>>> 8.440 s]
> >>>>>>>> [INFO] tools/Parent Pom ...................................
> >>> SUCCESS
> >>>> [
> >>>>>>>> 0.631 s]
> >>>>>>>> [INFO] tools/freemarker codegen tooling ...................
> >>> SUCCESS
> >>>> [
> >>>>>>>> 5.236 s]
> >>>>>>>> [INFO] Drill Protocol .....................................
> >>> SUCCESS
> >>>> [
> >>>>>>>> 5.839 s]
> >>>>>>>> [INFO] Common (Logical Plan, Base expressions) ............
> >>> SUCCESS
> >>>> [
> >>>>>>>> 10.831 s]
> >>>>>>>> [INFO] contrib/Parent Pom .................................
> >>> SUCCESS
> >>>> [
> >>>>>>>> 0.815 s]
> >>>>>>>> [INFO] contrib/data/Parent Pom ............................
> >>> SUCCESS
> >>>> [
> >>>>>>>> 0.331 s]
> >>>>>>>> [INFO] contrib/data/tpch-sample-data ......................
> >>> SUCCESS
> >>>> [
> >>>>>>>> 2.838 s]
> >>>>>>>> [INFO] exec/Parent Pom ....................................
> >>> SUCCESS
> >>>> [
> >>>>>>>> 0.635 s]
> >>>>>>>> [INFO] exec/Java Execution Engine .........................
> >>> FAILURE
> >>>>>> [12:05
> >>>>>>>> min]
> >>>>>>>> [INFO] exec/JDBC Driver using dependencies ................
> >>> SKIPPED
> >>>>>>>> [INFO] JDBC JAR with all dependencies .....................
> >>> SKIPPED
> >>>>>>>> [INFO] contrib/mongo-storage-plugin .......................
> >>> SKIPPED
> >>>>>>>>
> >>>>>>>> Tests run: 11, Failures: 0, Errors: 3, Skipped: 0, Time
elapsed:
> >>>>>>>> 17.042 sec <<< FAILURE! - in
> >>>>>>>> org.apache.drill.exec.impersonation.TestImpersonationQueries
> >>
> testMultiLevelImpersonationEqualToMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries)
> >>>>>>>> Time elapsed: 0.099 sec  <<< ERROR!
> >>>>>>>> org.apache.drill.common.exceptions.UserRemoteException:
SYSTEM
> >>>> ERROR:
> >>>>>>>> OutOfMemoryError: unable to create new native thread
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [Error Id: a826ac5d-e278-49bc-8f92-fdf241d0e634 on
> >>>> 10.250.50.52:31010
> >>>>> ]
> >>>>>>>>  at
> >>
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
> >>>>>>>>  at
> >>
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112)
> >>>>>>>>  at
> >>
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
> >>>>>>>>  at
> >>
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
> >>>>>>>>  at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:68)
> >>>>>>>>  at
> >>>>> org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:390)
> >>>>>>>>  at
> >>
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
> >>>>>>>>  at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>>>>>>>  at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>>>>>>  at java.lang.Thread.run(Thread.java:744)
> >>>>>>>>
> >>>>>>>> On Fri, Nov 6, 2015 at 9:42 AM, Jacques Nadeau <
> >>> jacques@dremio.com>
> >>>>>> wrote:
> >>>>>>>>> It seems like we have four potentially show stopping
issues at
> >>> the
> >>>>>>>> moment:
> >>>>>>>>>
> >>>>>>>>> DRILL-4042: Windows build doesn't include right
version of
> >>> Hadoop
> >>>>>>>>> dependencies
> >>>>>>>>> DRILL-3480: Random message propagation timeouts
> >>>>>>>>> DRILL-4041: Reference count issue
> >>>>>>>>> DRILL-4046: Performance regression for some TPCH
queries
> >>>>>>>>>
> >>>>>>>>> Proposed next steps:
> >>>>>>>>>
> >>>>>>>>> DRILL-4042 has a clear fix and reproduction. Patrick,
do you
> >>> think
> >>>>> can
> >>>>>>>> have
> >>>>>>>>> a fix up for this shortly?
> >>>>>>>>>
> >>>>>>>>> For the 3480 & 4041, consistent reproductions
are missing. It
> >>>> would
> >>>>> be
> >>>>>>>>> great if everybody could try to help find reproductions
to
> >> these
> >>>>>> issues.
> >>>>>>>> I
> >>>>>>>>> think we should take stock again at the end of the
day to
> >> decide
> >>>>> next
> >>>>>>>> steps
> >>>>>>>>> and whether we want to hold the release for these.
> >>>>>>>>>
> >>>>>>>>> For 4046: I've heard that there are some performance
> >> regressions
> >>>>>> around a
> >>>>>>>>> couple of queries but the current symptoms don't
make a lot of
> >>>>> sense.
> >>>>>> I'd
> >>>>>>>>> like to collect some more data here and then decide
next
> >> steps.
> >>>>>>>>>
> >>>>>>>>> Let's see if we can get repros for each of the inconsistent
> >>> issues
> >>>>> and
> >>>>>>>>> check in again EOD.
> >>>>>>>>>
> >>>>>>>>> thanks,
> >>>>>>>>> Jacques
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Jacques Nadeau
> >>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>
> >>>>>>>>> On Thu, Nov 5, 2015 at 3:36 PM, Aditya <
> >> adityakishore@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Ran into another one - DRILL-4042
> >>>>>>>>>> <https://issues.apache.org/jira/browse/DRILL-4042>.
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Nov 5, 2015 at 1:48 PM, Jacques Nadeau
<
> >>>> jacques@dremio.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Yeah, I think that sinks it. Weird how Rat
complains only on
> >>>>>> windows...
> >>>>>>>>>>>
> >>>>>>>>>>> Let's take the rest of the business day
to test the current
> >>>>>> candidate
> >>>>>>>> to
> >>>>>>>>>>> make sure that we don't spin extra builds
unnecessarily.
> >>>>>>>>>>>
> >>>>>>>>>>> thanks,
> >>>>>>>>>>> Jacques
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Jacques Nadeau
> >>>>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, Nov 5, 2015 at 1:24 PM, Aditya <
> >>> adityakishore@gmail.com
> >>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Oh, I thought only master/trunk branch
was protected, but
> >>> now I
> >>>>> see
> >>>>>>>> the
> >>>>>>>>>>>> mail from David Nalley.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In such case, I propose that the release
manager could push
> >>> the
> >>>>>> branch
> >>>>>>>>>>>> to his/her private fork and put the
URL/hash in the vote
> >>>> starter
> >>>>>>>> thread.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The reason I was looking to the commit
history to determine
> >>> if
> >>>>> the
> >>>>>>>>>>>> candidate suffer from DRILL-4040, which,
evidently it does.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -1 as the build from source is failing.
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/DRILL-4040
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Nov 5, 2015 at 1:12 PM, Jacques
Nadeau <
> >>>>> jacques@dremio.com
> >>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I'm not sure what to do here. INFRA
just changed the Git
> >>>>> behavior
> >>>>>> so
> >>>>>>>> it
> >>>>>>>>>>>>> is no longer possible to delete
branches. I generally
> >> don't
> >>>> like
> >>>>>> to
> >>>>>>>> have
> >>>>>>>>>>>>> failed branches in a release history
(otherwise you get a
> >>>>> release
> >>>>>>>> branch
> >>>>>>>>>>>>> with all these maven forward/backwards
commits). As such,
> >> I
> >>>>> would
> >>>>>>>> overwrite
> >>>>>>>>>>>>> candidate branches historically
(dropping the failed
> >> release
> >>>>>>>> commits).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The commit is here right now:
> >>>>>>>>>>>>> https://github.com/jacques-n/drill/tree/drill-1.3.0-rc0
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The parent of 4822068a006aeb251b686d2b51871573c4337e60
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>> 3dedc158f3af8ec8320a9cd336b2798b09cc9a8d
(the tip of
> >> master)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Jacques Nadeau
> >>>>>>>>>>>>> CTO and Co-Founder, Dremio
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Nov 5, 2015 at 1:01 PM,
Aditya <
> >>>> adityakishore@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I am having trouble determining
the git commit this
> >> release
> >>>> is
> >>>>>> based
> >>>>>>>>>>>>>> on as
> >>>>>>>>>>>>>> I could not find the
> >>>>>>>>>>>>>> id (4822068a006aeb251b686d2b51871573c4337e60)
captured in
> >>> the
> >>>>>>>>>>>>>> git.properties bundled in the
> >>>>>>>>>>>>>> tarballs in the Drill Git repository.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Most likely the last commit
is only in your local branch
> >>> and
> >>>>>> since
> >>>>>>>>>>>>>> git.properties captures only
the
> >>>>>>>>>>>>>> last commit, it is impossible
to find the parent commit.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Would it make sense to push
the release branch?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> aditya...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Nov 4, 2015 at 11:08
PM, Jacques Nadeau <
> >>>>>> jacques@dremio.com
> >>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hey Everybody,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'm happy to propose a new
release of Apache Drill,
> >>> version
> >>>>>> 1.3.0.
> >>>>>>>>>>>>>> This is
> >>>>>>>>>>>>>>> the first release candidate
(rc0).  It covers a total
> >> of
> >>>> ~50
> >>>>>>>> closed
> >>>>>>>>>>>>>> JIRAs
> >>>>>>>>>>>>>>> [1].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The tarball artifacts are
hosted at [2] and the maven
> >>>>> artifacts
> >>>>>>>> are
> >>>>>>>>>>>>>> hosted
> >>>>>>>>>>>>>>> at [3].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The vote will be open for
72 hours ending at 11PM
> >>> Pacific,
> >>>>>>>> November
> >>>>>>>>>>>>>> 7,
> >>>>>>>>>>>>>>> 2015.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [ ] +1
> >>>>>>>>>>>>>>> [ ] +0
> >>>>>>>>>>>>>>> [ ] -1
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> thanks,
> >>>>>>>>>>>>>>> Jacques
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> [1]
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946
> >>>>>>>>>>>>>>> [2]
> >>>>> http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/
> >>>>>>>>>>>>>>> [3]
> >>>
> https://repository.apache.org/content/repositories/orgapachedrill-1013/
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message