drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul-rogers <...@git.apache.org>
Subject [GitHub] drill pull request #906: DRILL-5546: Handle schema change exception failure ...
Date Wed, 23 Aug 2017 01:09:12 GMT
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/906#discussion_r134627805
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
---
    @@ -768,4 +765,73 @@ else if (exprHasPrefix && refHasPrefix) {
           }
         }
       }
    +
    +  /**
    +   * handle FAST NONE specially when Project for query output. This happens when input
returns a
    +   * FAST NONE directly ( input does not return any batch with schema/data).
    +   *
    +   * Project operator has to return a batch with schema derived using the following 3
rules:
    +   *  Case 1:  *  ==>  expand into an empty list of columns.
    +   *  Case 2:  regular column reference ==> treat as nullable-int column
    +   *  Case 3:  expressions => Call ExpressionTreeMaterialization over an empty vector
contain.
    --- End diff --
    
    Is this description confusing two different scenarios?
    
    1. Empty result set, but a schema is provided. (The Scan Batch changes go out of their
way to provide a schema when possible.)
    2. Null result set: no rows and no schema.
    
    The rules in the Javadoc seem to relate to the second case: there are no columns to project.
    
    But, what do we do in the first case (when we have a schema, but no rows?) We should do
exactly what we'd do if we had data: matching up columns, inserting nullable ints for missing
columns, etc.
    
    Now, visualize the null result set as the same as an empty result set with no schema.
*Exactly the same* rules apply. We match up columns (for wildcard or a project list), but
will find none. So, we'll replace all reference with a nullable int.
    
    The point is, there should be only one code path; not two, and the one code path should
gracefully handle the case in which the schema is empty.
    
    That said, it is likely true that debugging the existing code path may be tedious, and
it may be faster to create a new code path. I wonder what that does for ongoing maintenance
costs, however, as future developers have to not only understand the original path, but now
must maintain the parallel "fast none" path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message