drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2342) Nullability property of the view created from parquet file is not correct
Date Thu, 19 Mar 2015 22:17:38 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370238#comment-14370238
] 

Jacques Nadeau commented on DRILL-2342:
---------------------------------------

LGTM.

+1

> Nullability property of the view created from parquet file is not correct
> -------------------------------------------------------------------------
>
>                 Key: DRILL-2342
>                 URL: https://issues.apache.org/jira/browse/DRILL-2342
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Venki Korukanti
>            Priority: Critical
>             Fix For: 0.9.0
>
>         Attachments: DRILL-2342-1.patch, DRILL-2342-3.patch, DRILL-2343-2.patch, t1.parquet
>
>
> Here is my t1 table definition:
> {code}
> message root {
>   optional int32 a1;
>   optional binary b1 (UTF8);
>   optional int32 c1 (DATE);
> }
> {code}
> I created a view on top of it:
> {code}
> 0: jdbc:drill:schema=dfs> create view v1 as select cast(a1 as int), cast(b1 as varchar(10)),
cast(c1 as date) from t1;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | View 'v1' created successfully in 'dfs.aggregation' schema |
> +------------+------------+
> 1 row selected (0.096 seconds)
> {code}
> IS_NULLABLE says 'NO', which is incorrect.
> {code}
> 0: jdbc:drill:schema=dfs> describe v1;
> +-------------+------------+-------------+
> | COLUMN_NAME | DATA_TYPE  | IS_NULLABLE |
> +-------------+------------+-------------+
> | EXPR$0      | INTEGER    | NO          |
> | EXPR$1      | VARCHAR    | NO          |
> | EXPR$2      | DATE       | NO          |
> +-------------+------------+-------------+
> 3 rows selected (0.067 seconds)
> {code}
> It is dangerous potentially, because if Calcite decided to take advantage over this property
tomorrow and create an optimization where if column is not nullable "is null" predicate can
be dropped, query : "select * from v1 where x is null" would return incorrect result.
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select * from v1 where z is null;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0], y=[$1], z=[$2])
> 00-02        SelectionVectorRemover
> 00-03          Filter(condition=[IS NULL($2)])
> 00-04            Project(x=[CAST($2):ANY NOT NULL], y=[CAST($1):ANY NOT NULL], z=[CAST($0):ANY
NOT NULL])
> 00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/t1]],
selectionRoot=/aggregation/t1, numFiles=1, columns=[`a1`, `b1`, `c1`]]])
> {code}
> It seems to me that in views column properties should be always nullable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message