drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5336) Columns returned by select over CTAS created parquet are not in correct order.
Date Thu, 22 Jun 2017 19:29:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059885#comment-16059885
] 

Khurram Faraaz commented on DRILL-5336:
---------------------------------------

[~amansinha100] [~jni] do we know why the column order is different when we use LIMIT clause
in the query ?

> Columns returned by select over CTAS created parquet are not in correct order.
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-5336
>                 URL: https://issues.apache.org/jira/browse/DRILL-5336
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.10.0
>            Reporter: Khurram Faraaz
>
> The ordering of the columns in the result of the SELECT over CTAS created parquet file
is not right.
> Column col_int should be present before col_chr, however col_chr appears before col_int,
in the result of select.
> Note that there is a UNION ALL in the CTAS's SELECT statement.
> And each of the select statements in the UNION ALL has an ORDER BY
> The problem seems to be with the use of LIMIT clause in the SELECT on dfs.tmp.temp_tbl_unall.
> Here is the parquet schema for the CTAS created parquet file. Note that col_int appears
before col_chr in the parquet schema too.
> {noformat}
> [root@centos-01 parquet-tools]# hadoop fs -get /tmp/temp_tbl_unall/0_0_0.parquet .
> [root@centos-01 parquet-tools]# ./parquet-schema 0_0_0.parquet
> message root {
>   optional int32 col_int;
>   optional binary col_chr (UTF8);
>   optional binary col_vrchr1 (UTF8);
>   optional binary col_vrchr2 (UTF8);
> }
> {noformat}
> Drill 1.10.0 git commit id : 3dfb4972
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE dfs.tmp.temp_tbl_unall as ( SELECT col_int,
col_chr, col_vrchr1, col_vrchr2 FROM typeall_l order by col_int ) UNION ALL ( SELECT col_int,
col_chr, col_vrchr1, col_vrchr2 FROM typeall_r order by col_int );
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 1107                       |
> +-----------+----------------------------+
> 1 row selected (0.381 seconds)
> 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM dfs.tmp.temp_tbl_unall ORDER BY col_int
LIMIT 100;
> +---------+---------+------------+------------+
> | col_chr | col_int | col_vrchr1 | col_vrchr2 |
> +---------+---------+------------+------------+
> | MI | 0 | Felecia Gourd | NLBQMg9 |
> | DE | 1 | Alvina Jenkins | f9MqJlnNettlCVGcShifgMgnzL5FrZmHysoMBe6kDtA |
> | HI | 1 | Fredrick Vanderburg | eN3CNLW8FE5voAksuJCSYnMdJrVown7my6DiAlI8KhrG69kQoAxKFJmOHPVca1FjGyHWd5Ag53vvODvKB8YwqXcbDihjR0DDbed1cgs7L1tndiPRvU1OreN5ByB8pF0QisgwSBWRKRvS8RVOzA3CyxOpjyxVujRLLlctww0jWwn09m3iINTi6Delw
|
> | CA | 19 | John Doe | test string |
> | CA | 19 | John Doe | test string |
> ...
>  
>  
> | LA | 6854 | William Burk | 5krBT7wj8BkoiRUWV9HjkyIT1DRpPj6bNixK15g4gs9IEsKc5myCyzMKQk5k1
|
> +----+------+---------------+-----------------+
> | col_chr | col_int | col_vrchr1 | col_vrchr2 |
> +----+------+---------------+-----------------+
> | IN | 6870 | Caroline Bell | M2811poVmVJLuxqsHz0jzRSGrAJDXfl3UuE0Iz8ldqvRURURvq2dO4Q1358eiureI20NCGBl9lBpoKPc78TWS0gsWhIt280E8JZPQpj7lOJXnHUmvydDiBPgAzNoGn7SSP6xYlnMyBhvWRxB5NF3I9vszosjmpW1Yx7et56QvwLfWBb3unJPnrxVYXX5tAfeyednJ4A90aOE2dhMXy1wLwewMJ91SWBEUM8TU3aGikQ5Ax6dDhDBQLaP
|
> +----+------+---------------+-----------------+
> 100 rows selected (0.173 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message