drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sudheeshkatkam <...@git.apache.org>
Subject [GitHub] drill pull request: DRILL-3623: For limit 0 queries, use a shorter...
Date Tue, 22 Mar 2016 21:05:45 GMT
Github user sudheeshkatkam commented on the pull request:

    https://github.com/apache/drill/pull/405#issuecomment-200026538
  
    Thank you for the reviews.
    
    All regression tests passed; I am running unit tests right now.
    
    Note that, the `planner.enable_limit0_optimization` option is disabled by default. To
summarize (and document) the limitations:
    
    If, during validation, the planner is able to resolve that the types of the columns (i.e.
types are non late binding), the shorter execution path is taken. Some types are excluded:
    + DECIMAL type is not fully supported in general.
    + VARBINARY is not fully tested.
    + MAP, ARRAY are currently not exposed to the planner.
    + TINYINT, SMALLINT are defined in the Drill type system but have been turned off for
now.
    + SYMBOL, MULTISET, DISTINCT, STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST are Calcite
types currently not supported by Drill, nor defined in the Drill type list.
    
    Three scenarios when the planner can do type resolution during validation:
    + Queries on Hive tables
    + Queries with explicit casts on table columns, example: `SELECT CAST(col1 AS BIGINT),
ABS(CAST(col2 AS INTEGER)) FROM table;`
    + Queries on views with casts on table columns
    
    In the latter two cases, the schema of the query with LIMIT 0 clause has relaxed nullability
compared to the query without the LIMIT 0 clause. Example:
    Say the schema definition of the Parquet file (`numbers.parquet`) is:
    ```
    message Numbers {
      required int col1;
      optional int col2;
     }
    ```
    
    Since the view definition does not specify nullability of columns, and schema of a parquet
file is not yet leveraged by Drill's planner:
    ```
    CREATE VIEW dfs.tmp.mynumbers AS SELECT CAST(col1 AS INTEGER) as col1, CAST(col2 AS INTEGER)
AS col2 FROM dfs.tmp.`numbers.parquet`;
    ```
    (1) For query with LIMIT 0 clause, since the file/ metadata is not read, Drill assumes
the nullability of both columns is [`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
    ```
    SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 0;
    ```
    
    (2) For query without LIMIT 0 clause, since the file is read, Drill knows the nullability
of `col1` is [`columnNoNulls`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNoNulls),
and `col2` is [`columnNullable`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html#columnNullable).
    ```
    SELECT col1, col2 FROM dfs.tmp.mynumbers LIMIT 1;
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message