drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasad Nagaraj Subramanya (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5719) Join query on a non existing column in a json file runs longer than usual
Date Mon, 14 Aug 2017 16:57:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasad Nagaraj Subramanya updated DRILL-5719:
---------------------------------------------
    Description: 
1) Join query on two json files
Column exists
{code}
select t.p_partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN dfs.`testData/partsupp.json`
as t1 ON t.p_partkey = t1.ps_partkey;
{code}
Columns doesnt exist (The part_json file has no key by name partkey)
{code}
select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN dfs.`testData/partsupp.json`
as t1 ON t.partkey = t1.ps_partkey;
{code}

part.json & partsupp.json - tpch sf1 dataset

Time taken when-
1) column exists in the file - 20secs
2) column doesnt exist in the file - 15mins

  was:
1) Join query on two json files
Column exists
{code}
select t.p_partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN dfs.`testData/partsupp.json`
as t1 ON t.p_partkey = t1.ps_partkey;
{code}
Columns doesnt exist
{code}
select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN dfs.`testData/partsupp.json`
as t1 ON t.partkey = t1.ps_partkey;
{code}
The part_json file has no key by name partkey.

part.json & partsupp.json - tpch sf1 dataset

Time taken when-
1) column exists in the file - 20secs
2) column doesnt exist in the file - 15mins


> Join query on a non existing column in a json file runs longer than usual
> -------------------------------------------------------------------------
>
>                 Key: DRILL-5719
>                 URL: https://issues.apache.org/jira/browse/DRILL-5719
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - JSON
>    Affects Versions: 1.11.0
>            Reporter: Prasad Nagaraj Subramanya
>
> 1) Join query on two json files
> Column exists
> {code}
> select t.p_partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN dfs.`testData/partsupp.json`
as t1 ON t.p_partkey = t1.ps_partkey;
> {code}
> Columns doesnt exist (The part_json file has no key by name partkey)
> {code}
> select t.partkey, t1.ps_partkey from dfs.`testData/part.json` as t RIGHT JOIN dfs.`testData/partsupp.json`
as t1 ON t.partkey = t1.ps_partkey;
> {code}
> part.json & partsupp.json - tpch sf1 dataset
> Time taken when-
> 1) column exists in the file - 20secs
> 2) column doesnt exist in the file - 15mins



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message