drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitalii Diravka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5822) Select * on directory containing multiple json files (one or more empty) with same schema doesn't preserve column order
Date Tue, 24 Oct 2017 11:45:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216742#comment-16216742
] 

Vitalii Diravka commented on DRILL-5822:
----------------------------------------

[~prasadns14] 
With "order by" clause I have reproduced the issue. Moreover empty files are not necessary
to hit the issue. 
The necessary conditions are: 
1. `planner.slice_target` = 1; 
2. ORDER BY clause in the query.

I brought this jira description into correspondence.

Eventually DRILL-5845 is other issue. But this issue is reproduced for TopNBatch operator
as well: 
{code}
0: jdbc:drill:zk=local> alter session reset `planner.slice_target`;
+-------+--------------------------------+
|  ok   |            summary             |
+-------+--------------------------------+
| true  | planner.slice_target updated.  |
+-------+--------------------------------+
1 row selected (0.082 seconds)
0: jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` order by n_name limit 1;
+--------------+----------+--------------+------------------------------------------------------+
| n_nationkey  |  n_name  | n_regionkey  |                      n_comment                
      |
+--------------+----------+--------------+------------------------------------------------------+
| 0            | ALGERIA  | 0            |  haggle. carefully final deposits detect slyly
agai  |
+--------------+----------+--------------+------------------------------------------------------+
1 row selected (0.141 seconds)
0: jdbc:drill:zk=local> alter session set `planner.slice_target`=1;
+-------+--------------------------------+
|  ok   |            summary             |
+-------+--------------------------------+
| true  | planner.slice_target updated.  |
+-------+--------------------------------+
1 row selected (0.091 seconds)
0: jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` order by n_name limit 1;
+------------------------------------------------------+----------+--------------+--------------+
|                      n_comment                       |  n_name  | n_nationkey  | n_regionkey
 |
+------------------------------------------------------+----------+--------------+--------------+
|  haggle. carefully final deposits detect slyly agai  | ALGERIA  | 0            | 0     
      |
+------------------------------------------------------+----------+--------------+--------------+
1 row selected (0.201 seconds)
{code} 


> Select * on directory containing multiple json files (one or more empty) with same schema
doesn't preserve column order
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5822
>                 URL: https://issues.apache.org/jira/browse/DRILL-5822
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.11.0
>            Reporter: Prasad Nagaraj Subramanya
>            Assignee: Vitalii Diravka
>             Fix For: 1.12.0
>
>
> Repro steps
> 1) Have multiple json files in a directory having the same schema
> 2) Also have one or more empty files 
> Scenarios
> 1) Only one minor fragment{code}select * from dfs.`/json_dir`;{code}
> {code}Result:
> +----------+------------+------------------------------------------+-----------------+-----------+----------------------------+---------+--------------+----------------+------------------------+
> | row_key  | p_partkey  |                  p_name                  |     p_mfgr     
|  p_brand  |           p_type           | p_size  | p_container  | p_retailprice  |     
 p_comment        |
> +----------+------------+------------------------------------------+-----------------+-----------+----------------------------+---------+--------------+----------------+------------------------+
> | 1        | 1          | goldenrod lace spring peru powder        | Manufacturer#1 
| Brand#13  | PROMO BURNISHED COPPER     | 7       | JUMBO PKG    | 901.0          | ly. slyly
ironi        |
> | 2        | 2          | blush rosy metallic lemon navajo         | Manufacturer#1 
| Brand#13  | LARGE BRUSHED BRASS        | 1       | LG CASE      | 902.0          | lar accounts
amo       |
> {code}
>  2) One minor fragment per file
> {code}alter session set `planner.slice_target`=1;
> select * from dfs.`/json_dir`;{code}
> Result:
> {code}
> +-----------+------------------------+--------------+-----------------+------------------------------------------+------------+----------------+---------+----------------------------+----------+
> |  p_brand  |       p_comment        | p_container  |     p_mfgr      |             
    p_name                  | p_partkey  | p_retailprice  | p_size  |           p_type   
       | row_key  |
> +-----------+------------------------+--------------+-----------------+------------------------------------------+------------+----------------+---------+----------------------------+----------+
> | Brand#13  | ly. slyly ironi        | JUMBO PKG    | Manufacturer#1  | goldenrod lace
spring peru powder        | 1          | 901.0          | 7       | PROMO BURNISHED COPPER
    | 1        |
> | Brand#13  | lar accounts amo       | LG CASE      | Manufacturer#1  | blush rosy metallic
lemon navajo         | 2          | 902.0          | 1       | LARGE BRUSHED BRASS       
| 2        |
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message