drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1649) JSON : Joining 2 sub-queries (one of them uses flatten) fails with "Hash Join doe not support schema changes"
Date Tue, 11 Nov 2014 23:35:34 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207329#comment-14207329
] 

Jinfeng Ni commented on DRILL-1649:
-----------------------------------

>From the error log, seems query fails when the output of "FLATTEN" is referenced directly
using map-like syntax  in the out-subquery. 

In the query "transaction" is the reference name for the output of "flatten".

2014-11-06 23:04:39,124 [b209c5d8-da7d-41a9-a153-0b5f6246a481:frag:0:0] WARN  o.a.d.e.e.ExpressionTreeMaterializer
- Unable to find value vector of path SchemaPath [`transaction`.`trans_time`], returning null
instance.

In order to support such use case, I think the FLATTEN physical operator need take two inputs:
one is the input data, the other one is the reference name.  The reference name would be prefixed
into the SchemaPath of generated record vectors in VectorContainer.  

> JSON : Joining 2 sub-queries (one of them uses flatten) fails with "Hash Join doe not
support schema changes" 
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-1649
>                 URL: https://issues.apache.org/jira/browse/DRILL-1649
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill, Storage - JSON
>            Reporter: Rahul Challapalli
>         Attachments: error.log, single-user-transactions.json
>
>
> git.commit.id.abbrev=60aa446
> I am running this test against Jason's branch which has some fixes to a few flatten issues.
 
> The below query fails 
> {code}
> select event_info.uid, transaction_info.trans_id, event_info.event.evnt_id
> from (
>  select userinfo.transaction.trans_id trans_id, max(userinfo.event.event_time) max_event_time
>  from (
>      select uid, flatten(events) event, flatten(transactions) transaction from `json_kvgenflatten/single-user-transactions.json`
>  ) userinfo
>  where userinfo.transaction.trans_time >= userinfo.event.event_time
>  group by userinfo.transaction.trans_id
> ) transaction_info
> inner join
> (
>  select uid, flatten(events) event
>  from `json_kvgenflatten/single-user-transactions.json`
> ) event_info
> on transaction_info.max_event_time = event_info.event.event_time;
> {code}
> The problem still persists even if I create views on top of each sub-query and the join
them
> {code}
> create view v1 as 
> select userinfo.transaction.trans_id trans_id, max(userinfo.event.event_time) max_event_time
>  from (
>      select uid, flatten(events) event, flatten(transactions) transaction from `json_kvgenflatten/single-user-transactions.json`
>  ) userinfo 
>  where userinfo.transaction.trans_time >= userinfo.event.event_time 
>  group by userinfo.transaction.trans_id;
>  
> create view v2 as select uid, flatten(events) event
>  from `json_kvgenflatten/single-user-transactions.json`;
>  
> select v2.uid, v1.trans_id, v2.event.evnt_id
> from v1 inner join v2 
> on v1.max_event_time = v2.event.event_time;
> {code}
> However if I create 2 files with the exact data from the outputs of the 2 sub-queries
and try to join them the everything works fine.
> I attached the data, and the error log files. Let me know if you need anything



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message