hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Pachitariu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-20262) Implement stats annotation rule for the UDTFOperator
Date Sun, 29 Jul 2018 18:14:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-20262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

George Pachitariu updated HIVE-20262:
-------------------------------------
    Attachment: HIVE-20262.1.patch
        Status: Patch Available  (was: Open)

> Implement stats annotation rule for the UDTFOperator
> ----------------------------------------------------
>
>                 Key: HIVE-20262
>                 URL: https://issues.apache.org/jira/browse/HIVE-20262
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>            Reporter: George Pachitariu
>            Assignee: George Pachitariu
>            Priority: Minor
>         Attachments: HIVE-20262.1.patch, HIVE-20262.patch
>
>
> User Defined Table Functions (UDTFs) change the number of rows of the output. A common
UDTF is the explode() method that creates a row for each element for each array in the input
column.
>  
> Right now, the number of output rows is equal to the number of input rows. But if the
average number of output rows is bigger than 1, the resulting number of rows is underestimated
in the execution plan.
>  
> Implement a rule that can have a factor X as a parameter and for each UDTF function
predict that:
>  
> {code:java}
> number of output rows = X * number of input rows{code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message