drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-1781) For complex functions, don't return until schema is known
Date Wed, 26 Nov 2014 12:02:12 GMT
Steven Phillips created DRILL-1781:

             Summary: For complex functions, don't return until schema is known
                 Key: DRILL-1781
                 URL: https://issues.apache.org/jira/browse/DRILL-1781
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Steven Phillips

In the case of complex output functions, it is impossible to determine the output schema until
the actual data is consumed. For example, with convert_form(VARCHAR, 'json'), unlike most
other functions, it is not sufficient to know that the incoming data type is VARCHAR, we actually
need to decode the contents of the record before we can determine what the output type is,
whether it be map, list, or primitive type.

For fast schema return, we worked around this problem by simply assuming the type was Map,
and if it happened to be different, there would be a schema change. This solution is not satisfactory,
as it ends up breaking other functions, like flatten.

The solution is to continue returning a schema whenever possible, but when it is not possible,
drill will wait until it is.

For non-blocking operators, drill will immediately consume the incoming batch, and thus will
not return empty schema batches if there is data to consume. Blocking operators will return
an empty schema batch. If a flattten function occurs downstream from a blocking operator,
it will not be able to return a schema, and thus fast schema return will not happen in this

In the cases where the complex function is not downstream from a blocking operator, fast schema
return should continue to work.

This message was sent by Atlassian JIRA

View raw message