hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Navis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
Date Wed, 28 Jan 2015 02:12:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294595#comment-14294595
] 

Navis commented on HIVE-9228:
-----------------------------

Yes, when PTF column is not selected, we should prune the function itself in PTF operator.
But I thought it's trivial case not to select the column which was calculated with heavy cost.
And select operator would be removed by IdentityProjectRemover if it's not needed. 
By the way, could you review HIVE-9138 first? It's hard to debug something on PTF without
any explain result.

> Problem with subquery using windowing functions
> -----------------------------------------------
>
>                 Key: HIVE-9228
>                 URL: https://issues.apache.org/jira/browse/HIVE-9228
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>    Affects Versions: 0.13.1
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>         Attachments: HIVE-9228.1.patch.txt, HIVE-9228.2.patch.txt, create_table_tab1.sql,
tab1.csv
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works fine.
> select col1, col2, col3 from (select col1,col2, col3, count(case when col4=1 then 1 end
) over (partition by col1, col2) as col5, row_number() over (partition by col1, col2 order
by col4) as col6 from tab1) t;
> HIVE generates an execution plan with 2 jobs. 
> 1. The first job is to basically calculate window function for col5.  
> 2. The second job is to calculate window function for col6 and output.
> The plan says the first job outputs the columns (col1, col2, col3, col4) to a tmp file
since only these columns are used in later stage. While, the PTF operator for the first job
outputs (_wcol0, col1, col2, col3, col4) with _wcol0 as the result of the window function
even it's not used. 
> In the second job, the map operator still reads the 4 columns (col1, col2, col3, col4)
from the temp file using the plan. That causes the exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message