drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5336) Columns returned by select over CTAS created parquet are not in correct order.
Date Thu, 09 Mar 2017 11:59:38 GMT
Khurram Faraaz created DRILL-5336:
-------------------------------------

             Summary: Columns returned by select over CTAS created parquet are not in correct
order.
                 Key: DRILL-5336
                 URL: https://issues.apache.org/jira/browse/DRILL-5336
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.10.0
            Reporter: Khurram Faraaz


The ordering of the columns in the result of the SELECT over CTAS created parquet file is
not right.
Column col_int should be present before col_chr, however col_chr appears before col_int, in
the result of select.
Note that there is a UNION ALL in the CTAS's SELECT statement.
And each of the select statements in the UNION ALL has an ORDER BY

The problem seems to be with the use of LIMIT clause in the SELECT on dfs.tmp.temp_tbl_unall.

Here is the parquet schema for the CTAS created parquet file. Note that col_int appears before
col_chr in the parquet schema too.

{noformat}
[root@centos-01 parquet-tools]# hadoop fs -get /tmp/temp_tbl_unall/0_0_0.parquet .
[root@centos-01 parquet-tools]# ./parquet-schema 0_0_0.parquet
message root {
  optional int32 col_int;
  optional binary col_chr (UTF8);
  optional binary col_vrchr1 (UTF8);
  optional binary col_vrchr2 (UTF8);
}
{noformat}

Drill 1.10.0 git commit id : 3dfb4972

{noformat}
0: jdbc:drill:schema=dfs.tmp> CREATE TABLE dfs.tmp.temp_tbl_unall as ( SELECT col_int,
col_chr, col_vrchr1, col_vrchr2 FROM typeall_l order by col_int ) UNION ALL ( SELECT col_int,
col_chr, col_vrchr1, col_vrchr2 FROM typeall_r order by col_int );
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1107                       |
+-----------+----------------------------+
1 row selected (0.381 seconds)
0: jdbc:drill:schema=dfs.tmp> SELECT * FROM dfs.tmp.temp_tbl_unall ORDER BY col_int LIMIT
100;
+---------+---------+------------+------------+
| col_chr | col_int | col_vrchr1 | col_vrchr2 |
+---------+---------+------------+------------+
| MI | 0 | Felecia Gourd | NLBQMg9 |
| DE | 1 | Alvina Jenkins | f9MqJlnNettlCVGcShifgMgnzL5FrZmHysoMBe6kDtA |
| HI | 1 | Fredrick Vanderburg | eN3CNLW8FE5voAksuJCSYnMdJrVown7my6DiAlI8KhrG69kQoAxKFJmOHPVca1FjGyHWd5Ag53vvODvKB8YwqXcbDihjR0DDbed1cgs7L1tndiPRvU1OreN5ByB8pF0QisgwSBWRKRvS8RVOzA3CyxOpjyxVujRLLlctww0jWwn09m3iINTi6Delw
|
| CA | 19 | John Doe | test string |
| CA | 19 | John Doe | test string |
...
 
 
| LA | 6854 | William Burk | 5krBT7wj8BkoiRUWV9HjkyIT1DRpPj6bNixK15g4gs9IEsKc5myCyzMKQk5k1
|
+----+------+---------------+-----------------+
| col_chr | col_int | col_vrchr1 | col_vrchr2 |
+----+------+---------------+-----------------+
| IN | 6870 | Caroline Bell | M2811poVmVJLuxqsHz0jzRSGrAJDXfl3UuE0Iz8ldqvRURURvq2dO4Q1358eiureI20NCGBl9lBpoKPc78TWS0gsWhIt280E8JZPQpj7lOJXnHUmvydDiBPgAzNoGn7SSP6xYlnMyBhvWRxB5NF3I9vszosjmpW1Yx7et56QvwLfWBb3unJPnrxVYXX5tAfeyednJ4A90aOE2dhMXy1wLwewMJ91SWBEUM8TU3aGikQ5Ax6dDhDBQLaP
|
+----+------+---------------+-----------------+
100 rows selected (0.173 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message