drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-6037) List vector can lose data when "promoting" to union
Date Mon, 18 Dec 2017 20:05:00 GMT
Paul Rogers created DRILL-6037:

             Summary: List vector can lose data when "promoting" to union
                 Key: DRILL-6037
                 URL: https://issues.apache.org/jira/browse/DRILL-6037
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers

Drill provides a little-known {{ListVector}} used in the JSON reader to create an alternative
to the {{REPEATED}} data mode which allows array values to be null. That is, the list vector
allows the following:

{a: [10, 20]} {a: null}

(It is unclear if the rest of Drill can handle this extra null state, however.)

The list vector has another form of magic. It can be "promoted" to a list of (barely supported)
unions. Promotion to union allows the following:

{a: [10, "twenty"]}

Promotion to union is done via a call to {{ListVector.promoteToUnion()}} which appears to
be called only from {{PromotableWriter.promoteToUnion()}}.

The {{ListVector.promoteToUnion()}} call itself transforms the list from a list of something
to a list of Union, with the something as the first union member. However *it does not* go
back and update the Union's type vector with the type of the prior values.

That work is done in {{PromotableWriter.promoteToUnion()}}, meaning that other uses (such
as the size-aware writers) must duplicate that functionality or risk losing the values before
the promotion. The code should be in the vector itself so that {{ListVector.promoteToUnion()}}
"does the right thing" without clients needing to fill in part of the work.

Another feature of lists is that, unlike {{REPEATED}} types, lists allow nulls as list values.
That is, a list can support the following:

{a: [10, null, 20]}

The code in {{PromotableWriter.promoteToUnion()}} code is wrong: it sets all unions to the
prior type (such as BIGINT in the example above) without considering if the value is null.
As a result, after promotion to union, the above list will be:

{a: [10, 0, 20]}

The code should check the null flag on each value. If null, set the union's type vector to
the null marker, else set it to the type of the prior vector.

Note: a new version, {{ListVector.convertToUnion()}} was created for use in the new size-aware
writers. The old version should be fixed or deprecated to avoid data corruption errors. 

This message was sent by Atlassian JIRA

View raw message