hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Yang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-655) Add support for user defined table generating functions
Date Fri, 13 Nov 2009 22:36:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777702#action_12777702

Paul Yang commented on HIVE-655:

So I had a discussion with Ning and Namit this morning and a slightly different syntax for
UDTF's was proposed. Something like:

SELECT pageid, adid FROM myTable LATERAL VIEW explode(adid_list) AS adid ;

where the LATERAL VIEW keyword associates the given UDTF with the table in the FROM clause.
As Ning pointed out, one of the issues with having the UDTF in the SELECT is that queries
like the following

SELECT pageid, explode(adid_list), count(1) FROM myTable GROUP BY pageid;

are a bit confusing as it's not clear what it's supposed to do. We could disallow these sort
of operations but it makes it more complicated to the user. Using LATERAL VIEW also handles
Raghotham's concern about having to specify the input for the UDTF. The UDTF still returns
one column, thought multiple values can be returned via a an array or a struct. 

Zheng, do you have any thoughts about the proposed syntax? I know from early on UDTF's were
planned to be in the SELECT clause and I'm wondering if there were other reasons for why UDTF's
should be there. With SELECT, it seemed more straightforward implementation-wise. Also, going
back to TRANSFORM, it does seem like it can fit in FROM too. What was the rationale for having
it in the SELECT?

> Add support for user defined table generating functions
> -------------------------------------------------------
>                 Key: HIVE-655
>                 URL: https://issues.apache.org/jira/browse/HIVE-655
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Raghotham Murthy
>            Assignee: Paul Yang
>         Attachments: HIVE-655.1.patch, HIVE-655.2.patch
> Provide a way for users to add a table generating function, i.e., functions that generate
multiple rows from a single input row. Currently, the only way to do it is via the TRANSFORM
clause which requires streaming the data.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message