drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
Date Fri, 02 Mar 2018 02:23:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383023#comment-16383023

ASF GitHub Bot commented on DRILL-6125:

Github user ilooner commented on the issue:

    @arina-ielchiieva @vrozov I believe I have a solution. There were several issues with
the original code.
    1. It made incorrect assumptions about how cache invalidation works with java **synchronized**.
    2. It assumed **innerNext** and **close** would be called sequentially.
    I believe this PR fixes these issues now and I have gone into more detail about the problems
    # 1. Incorrect Cache Invalidation Assumptions
    The original code was trying to be smart by trying to reduce synchronization overhead
on **innerNext**. So the code in **innerNext** did not synchronize before changing the partitioner
object since this would be called often. The code in **close()** and ** receivingFragmentFinished()**
synchronized before accessing the partitioner with the intention that this would trigger an
update of the partitioner variable state across all threads. Unfortunately, this assumption
is invalid (see https://stackoverflow.com/questions/22706739/does-synchronized-guarantee-a-thread-will-see-the-latest-value-of-a-non-volatile).
Every thread that accesses a variable must synchronize before accessing a variable in order
to properly invalidate cached data on a core. 
    For example if **Thread A** modifies **Variable 1** then **Thread B** synchronizes before
accessing **Variable 1** then there is no guarantee **Thread B** will see the most updated
value for **Variable 1** since it might .
    ## Solution
    In summary the right thing to do is the simple thing. Make the methods synchronized. Unfortunately
there is no way to outsmart the system and reduce synchronization overhead without causing
race conditions.
    # 2. Concurrent InnerNext and Close Calls
    The original code did not consider the case that innerNext was in the middle of execution
when close was called. It did try to handle the case where **innerNext** could be called after
**close** by setting the **ok** variable. But it didn't even do that right because there was
no synchronization around the **ok** variable.
    ## Solution
    The right thing to do is the simple thing. Make sure the methods are synchronized so close
has to wait until innerNext is done before executing. Also when a query is cancelled the executing
thread should be interrupted the thread running innerNext incase it is on a blocking call.

> PartitionSenderRootExec can leak memory because close method is not synchronized
> --------------------------------------------------------------------------------
>                 Key: DRILL-6125
>                 URL: https://issues.apache.org/jira/browse/DRILL-6125
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Minor
>             Fix For: 1.13.0
> PartitionSenderRootExec creates a PartitionerDecorator and saves it in the *partitioner*
field. The creation of the partitioner happens in the createPartitioner method. This method
get's called by the main fragment thread. The partitioner field is accessed by the fragment
thread during normal execution but it can also be accessed by the receivingFragmentFinished
method which is a callback executed by the event processor thread. Because multiple threads
can access the partitioner field synchronization is done on creation and on when receivingFragmentFinished.
However, the close method can also be called by the event processor thread, and the close
method does not synchronize before accessing the partitioner field. Since synchronization
is not done the event processor thread may have an old reference to the partitioner when a
query cancellation is done. Since it has an old reference the current partitioner can may
not be cleared and a memory leak may occur.

This message was sent by Atlassian JIRA

View raw message