drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-5156) Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
Date Fri, 23 Dec 2016 08:08:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772269#comment-15772269
] 

Paul Rogers edited comment on DRILL-5156 at 12/23/16 8:08 AM:
--------------------------------------------------------------

The problem appears to be a bug in {{BootStrapContext}} which creates two thread pools, but
does not close them. The two pools are for the "BitClient-n" and "BitServer-n" threads. During
close, the {{BootStrapContext.close()}} method closes the allocator but leaves the threads
running.

Since they are left running, the BitClient thread attempts to use the (now closed) allocator
and triggers the {{IllegalStateException}}. This behavior is easy to see by setting the breakpoint
described above. Leave the thread stopped at that breakpoint. The rest of the Drillbit shuts
down around the suspended thread, showing that the Drillbit did not wait for the thread.

The fix is simple:

{code}
  public void close() {
    try {
      loop2.shutdownGracefully(0, 0, TimeUnit.SECONDS);
    } catch ( Exception e ) {
      logger.warn("Failure During Bit-Client shutdown.", e);
    }
    try {
      loop.shutdownGracefully(0, 0, TimeUnit.SECONDS);
    } catch ( Exception e ) {
      logger.warn("Failure During Bit-Server shutdown.", e);
    }
    ...
{code}

After this fix, the test case runs fine with no {{IllegalStateException}}s.


was (Author: paul-rogers):
The problem appears to be a bug in {{BootStrapContext}} which creates two thread pools, but
does not close them. The two pools are for the "BitClient-n" and "BitServer-n" threads. During
close, the {{BootStrapContext.close()}} method closes the allocator but leaves the threads
running.

Since they are left running, the BitClient thread attempts to use the (now closed) allocator
and triggers the {{IllegalStateException}}. This behavior is easy to see by setting the breakpoint
described above. Leave the thread stopped at that breakpoint. The rest of the Drillbit shuts
down around the suspended thread, showing that the Drillbit did not wait for the thread.

The fix is simple:

{code}
  public void close() {
    try {
      loop2.shutdownGracefully(0, 0, TimeUnit.SECONDS);
    } catch ( Exception e ) {
      logger.warn("Failure During Bit-Client shutdown.", e);
    }
    try {
      loop.shutdownGracefully(0, 0, TimeUnit.SECONDS);
    } catch ( Exception e ) {
      logger.warn("Failure During Bit-Server shutdown.", e);
    }
    ...
{code}

After this fix, the test case runs fine with no {{IllegalStateExceptions}}.

> Bit-Client thread finds closed allocator in TestDrillbitResilience unit test
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-5156
>                 URL: https://issues.apache.org/jira/browse/DRILL-5156
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> RPC thread attempts to access a closed allocator during the {{TestDrillbitResilience}}
unit test.
> Set a Java exception breakpoint for {{IllegalStateException}}. Run the {{TestDrillbitResilience}}
unit tests.
> You will see quite a few exceptions, including the following in a thread called BitClient-1:
> {code}
> RootAllocator(BaseAllocator).assertOpen() line 109
> RootAllocator(BaseAllocator).buffer(int) line 191
> DrillByteBufAllocator.buffer(int) line 49
> DrillByteBufAllocator.ioBuffer(int) line 64
> AdaptiveRecvByteBufAllocatpr$HandleImpl.allocate(ByteBufAllocator) line 104
> NioSocketChannel$NioSocketChannelUnsafe(...).read() line 117
> ...
> NioEventLoop.run() line 354
> {code}
> The test continues (then fails for some other reason), which is why this is marked as
minor. Still, it seems odd that the client thread should attempt to access a closed allocator.
> At this point, it is not clear how we got into this state. The test itself is waiting
for a response from the server in the {{tailsAfterMSorterSorting}} test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message