hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-20362) PTF: Invert sorting for CURRENT_ROW to UNBOUNDED FOLLOWING
Date Fri, 10 Aug 2018 16:54:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-20362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V updated HIVE-20362:
---------------------------
    Description: 
{code}
COALESCE(current_page, FIRST_VALUE(current_page,TRUE) OVER ( PARTITION BY user_id ORDER BY
timestamp ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING))
{code}

This is a spilling PTF because it needs an unseen row to generate the current row when executing
this.

This can be inverted, so that the shuffle sorts it the other way, so the result can be generated
from a previously seen row.

{code}
COALESCE(current_page, LAST_VALUE(current_page,TRUE) OVER ( PARTITION BY user_id ORDER BY
timestamp asc ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW))
{code}

This query pattern is bad in general, but seems to exist as a workaround for - HIVE-18145

  was:
{code}
COALESCE(current_page, FIRST_VALUE(current_page,TRUE) OVER ( PARTITION BY user_id ORDER BY
timestamp ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING))
{code}

This is a spilling PTF because it needs an unseen row to generate the current row when executing
this.

This can be inverted, so that the shuffle sorts it the other way, so the result can be generated
from a previously seen row.

{code}
COALESCE(current_page, LAST_VALUE(current_page,TRUE) OVER ( PARTITION BY user_id ORDER BY
timestamp ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW))
{code}

This query pattern is bad in general, but seems to exist as a workaround for - HIVE-18145


> PTF: Invert sorting for CURRENT_ROW to UNBOUNDED FOLLOWING 
> -----------------------------------------------------------
>
>                 Key: HIVE-20362
>                 URL: https://issues.apache.org/jira/browse/HIVE-20362
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO, PTF-Windowing
>            Reporter: Gopal V
>            Priority: Major
>
> {code}
> COALESCE(current_page, FIRST_VALUE(current_page,TRUE) OVER ( PARTITION BY user_id ORDER
BY timestamp ROWS BETWEEN CURRENT ROW and UNBOUNDED FOLLOWING))
> {code}
> This is a spilling PTF because it needs an unseen row to generate the current row when
executing this.
> This can be inverted, so that the shuffle sorts it the other way, so the result can be
generated from a previously seen row.
> {code}
> COALESCE(current_page, LAST_VALUE(current_page,TRUE) OVER ( PARTITION BY user_id ORDER
BY timestamp asc ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW))
> {code}
> This query pattern is bad in general, but seems to exist as a workaround for - HIVE-18145



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message