hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HIVE-11930) how to prevent ppd the topN(a) udf predication in where clause?
Date Wed, 23 Sep 2015 14:55:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Chauhan resolved HIVE-11930.
-------------------------------------
    Resolution: Not A Problem

Your udf is stateful, so you should mark it as such.
{code}
@UDFType(stateful = true)
{code}

Above annotation will prevent a particular udf from ppd.
Also, this is not a bug report, but a usage question which should be posted on mailing list.

> how to prevent ppd the topN(a) udf predication in where clause?
> ---------------------------------------------------------------
>
>                 Key: HIVE-11930
>                 URL: https://issues.apache.org/jira/browse/HIVE-11930
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 0.14.0
>            Reporter: Feng Yuan
>            Priority: Blocker
>
> select a.state_date,a.customer,a.taskid,a.step_id,a.exit_title,a.pv,top1000(a.only_id)
>           from
>                 (  select t1.state_date,t1.customer,t1.taskid,t1.step_id,t1.exit_title,t1.pv,t1.only_id
>                   from 
>                       ( select t11.state_date,
>                                t11.customer,
>                                t11.taskid,
>                                t11.step_id,
>                                t11.exit_title,
>                                t11.pv,
>                                concat(t11.customer,t11.taskid,t11.step_id) as only_id
>                        from
>                           (  select state_date,customer,taskid,step_id,exit_title,count(*)
as pv
>                              from bdi_fact2.mid_url_step
>                              where exit_url!='-1'
>                              and exit_title !='-1'
>                              and l_date='2015-08-31'
>                              group by state_date,customer,taskid,step_id,exit_title
>                             )t11
>                        )t1
>                        order by t1.only_id,t1.pv desc
>                  )a
>           where  a.customer='Cdianyingwang'
>           and a.taskid='33'
>           and a.step_id='0' 
>           and top1000(a.only_id)<=10;
> in above example:
> outer top1000(a.only_id)<=10;will ppd to:
> stage 1:
> ( select t11.state_date,
>                                t11.customer,
>                                t11.taskid,
>                                t11.step_id,
>                                t11.exit_title,
>                                t11.pv,
>                                concat(t11.customer,t11.taskid,t11.step_id) as only_id
>                        from
>                           (  select state_date,customer,taskid,step_id,exit_title,count(*)
as pv
>                              from bdi_fact2.mid_url_step
>                              where exit_url!='-1'
>                              and exit_title !='-1'
>                              and l_date='2015-08-31'
>                              group by state_date,customer,taskid,step_id,exit_title
>                             )t11
>                        )t1
> and this stage have 2 reduce,so you can see this will output 20 records,
> upon to outer stage,the final results is exactly this 20 records.
> so i want to know is there any way to hint this topN udf predication not to ppd?
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message