drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5083) IteratorValidator does not handle RecordIterator cleanup call to next( )
Date Tue, 29 Nov 2016 05:54:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704314#comment-15704314

Paul Rogers commented on DRILL-5083:

It seems that only the {{RecordIterator}} has the {{clearInflightBatches( )}} behavior.

Also seems that the ScreenCreator.ScreenRoot iterates though all operators for the fragment
and closes them one by one, from the top down.

But, when we get to the MergeJoin, it is the merge join's close( ) itself that close the left
and right inputs (which, in my test case, were record iterators.) Is this necessary? Won't
the ScreenRoot do the closing? Stepping through the code shows that it will, so we're closing
the same operator multiple times.

The record iterator appears to buffer batches to allow moderate rewinding. If so, then does
it really need to call {{next()}} on its incoming operator? Why not let the incoming operator
clean up it's own mess in close? Code in question:

    while (lastOutcome == IterOutcome.OK || lastOutcome == IterOutcome.OK_NEW_SCHEMA) {
      // Clear all buffers from incoming.
      for (VectorWrapper<?> wrapper : incoming) {
      lastOutcome = incoming.next();

Note that this is a loop to read from incoming while data is ready. But, if we are on a branch
that didn't fail, the incoming does not know that an error condition occurred and will operate
as normal. (Or so it seems.)

Tracing through the code, cleanup seems to stay more-or-less on track through upper (failed)
branch of the first merge join. (The iterator validator throws an exception when the record
iterator tries to call next.)

But, in the lower branch, we have another merge that doesn't know that the query failed. It
is in its start state. As it turns out, this merge is stopped via the STOP code shown above.

Yet another interesting fact is that, somehow, the failed operators got put on the end of
the list of operators to close. That is, the external sort, which ran out of memory and is
holding onto in-memory batches, was closed last, releasing its (now useless) memory as the
last operation. Seems this should be done first (or at least in proper top-down order.)

Perhaps the close methods should be extended to flag redundant closes to catch such problems.

> IteratorValidator does not handle RecordIterator cleanup call to next( )
> ------------------------------------------------------------------------
>                 Key: DRILL-5083
>                 URL: https://issues.apache.org/jira/browse/DRILL-5083
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Priority: Minor
> This one is very confusing...
> In a test with a MergeJoin and external sort, operators are stacked something like this:
> {code}
> Screen
> - MergeJoin
> - - External Sort
> ...
> {code}
> Using the injector to force a OOM in spill, the external sort threw a UserException up
the stack. This was handed by:
> {code}
> IteratorValidatorBatchIterator.next( )
> RecordIterator.clearInflightBatches( )
> RecordIterator.close( )
> MergeJoinBatch.close( )
> {code}
> Which does the following:
> {code}
>       // Check whether next() should even have been called in current state.
>       if (null != exceptionState) {
>         throw new IllegalStateException(
> {code}
> But, the exceptionState is set, so we end up throwing an IllegalStateException during
> Seems the code should agree: if {{next( )}} will be called during cleanup, then {{next(
)}} should gracefully handle that case.

This message was sent by Atlassian JIRA

View raw message