flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabor Gevay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3291) Object reuse bug in MergeIterator.HeadStream.nextHead
Date Wed, 03 Feb 2016 14:23:39 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130452#comment-15130452

Gabor Gevay commented on FLINK-3291:

OK, sorry, I think I have now understood what you meant by ReduceDriver.run not tracking the
object returned from the iterator call. The problem here is that after 0a8df6d513fa59d650ff875bdf3a1613d0f14af5,
I mustn't modify an object that I have given to an iterator.next call as a reuse object, because
MergeIterator.HeadStream.nextHead saves a reference to it, and expects that object to not
change. But this seems like a rather scary requirement, and I wouldn't be sure that some other
code besides ReduceDriver somewhere doesn't also violate it.

I think that the root cause of these issues, is that the documentation about object reuse
[1] is rather inadequate in clearly stating what are the contracts in this area, so I tried
to put together a Google Doc about this: [2]. Could you please look at it, and tell me how
much it aligns with your way of thinking about object reuse?

[1] https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#object-reuse-behavior
[2] https://docs.google.com/document/d/1cgkuttvmj4jUonG7E2RdFVjKlfQDm_hE6gvFcgAfzXg/edit

> Object reuse bug in MergeIterator.HeadStream.nextHead
> -----------------------------------------------------
>                 Key: FLINK-3291
>                 URL: https://issues.apache.org/jira/browse/FLINK-3291
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>    Affects Versions: 1.0.0
>            Reporter: Gabor Gevay
>            Assignee: Gabor Gevay
>            Priority: Critical
> MergeIterator.HeadStream.nextHead saves a reference into `this.head` of the `reuse` object
that it got as an argument. This object might be modified later by the caller.
> This actually happens when ReduceDriver.run calls input.next (which will actually be
MergeIterator.next(E reuse)) in the inner while loop of the objectReuseEnabled branch, and
that calls top.nextHead with the reference that it got from ReduceDriver, which erroneously
saves the reference, and then ReduceDriver later uses that same object for doing the reduce.
> Another way in which this fails is when MergeIterator.next(E reuse) gives `reuse` to
different `top`s in different calls, and then the heads end up being the same object.
> You can observe the latter situation in action by running ReducePerformance here:
> https://github.com/ggevay/flink/tree/merge-iterator-object-reuse-bug
> Set memory to -Xmx200m (so that the MergeIterator actually has merging to do), put a
breakpoint at the beginning of MergeIterator.next(reuse), and then watch `reuse`, and the
heads of the first two elements of `this.heap` in the debugger. They will get to be the same
object after hitting continue about 6 times.
> You can also look at the count that is printed at the end, which shouldn't be larger
than the key range. Also, if you look into the output file /tmp/xxxobjectreusebug, for example
the key 999977 appears twice.
> The good news is that I think I can see an easy fix that doesn't affect performance:
MergeIterator.HeadStream could have a reuse object of its own as a member, and give that to
iterator.next in nextHead(E reuse). And then we wouldn't need the overload of nextHead that
has the reuse parameter, and MergeIterator.next(E reuse) could just call its other overload.

This message was sent by Atlassian JIRA

View raw message