drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5134) TestMergeJoinWithSchemaChanges throws exception with paged SV4
Date Sun, 18 Dec 2016 23:55:58 GMT
Paul Rogers created DRILL-5134:

             Summary: TestMergeJoinWithSchemaChanges throws exception with paged SV4
                 Key: DRILL-5134
                 URL: https://issues.apache.org/jira/browse/DRILL-5134
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Paul Rogers
            Priority: Minor

The {{TestMergeJoinWithSchemaChanges}} test exercises the in-memory merge sort with union
vectors. (Note that union vectors are not fully supported.)

The merge sort creates an SV4 to hold an index into the sorted results. SV4's have the ability
to page results as batches to upstream.

When {{TestMergeJoinWithSchemaChanges}} is run using the "managed" external sort and union
vectors, a downstream operator throws an index out of range exception. However, when run with
the "classic" external sort, no such exception is thrown.

The difference is that the classic version returns all rows in a single batch, while the managed
version attempted to return rows in a batch of a specified size.

The paging approach works for tests that do not include union vectors, but fails for those
that do include them.

Modifying the managed version to return all results in a single batch does work.

The problem with this workaround is that there will come a size beyond which sorted results
cannot be returned in a single batch and paging will be necessary. The sort buffer can, for
example, be set to 10G, which is too large for a single batch. Or, the sort can process more
than 64K rows, which is also too large for a single batch. In those scenarios, union vectors
with SV4 will fail.

Since union vectors are not supported, the workaround described above is used to get the test
to pass. This ticket records the issue for a future time in which we attempt to support union

This message was sent by Atlassian JIRA

View raw message