hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Yang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-935) support LATERAL VIEW
Date Mon, 30 Nov 2009 23:10:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783927#action_12783927
] 

Paul Yang commented on HIVE-935:
--------------------------------

Some notes from an offline discussion

1. To be similar to other vendors, the alias specified after the lateral view might be better
as a table alias instead of as a column alias. For reference, take the query:
{code}
SELECT <expressions> FROM example_table LATERAL VIEW explode(adid_list) AS adid
{code}

Whether the alias 'adid' is for table or a column affects which <expressions> are (in)valid.
If adid were a table alias, then the following query would not work:
{code}
SELECT adid ...
{code}
On the other hand If adid were a column, alias, the following query would not work:
{code}
SELECT adid.* ...
{code}
In either case, the following query would be valid:
{code}
SELECT adid.col0 ...
{code}


2. Following the notion that the output of the UDTF is a table, the output of explode() and
any other GenericUDTF should be a struct.

3. The operator structure of the lateral view is as follows:

For a query such as

{code}
SELECT pageid, adid.* FROM example_table LATERAL VIEW explode(adid_list) AS adid
{code}

The top of the operator tree will look something like (excuse the ASCII):

{code}
*
       [Table Scan]
          /   \
 [Select](*)  [Select](adid_list)
         |      |
         |     [UDTF] (explode)
         \     /
   [Lateral View Join]
            |
            |
  [Select] (pageid, adid.*)
            |
           ....
           
{code}

Rows from the table scan operator will branch in two ways. The left branch will contain a
single select operator that gets all the columns from the table. The right branch will only
get the columns that should be sent to the following UDTF operator. The output of both the
left and right branches will get sent to a lateral view join operator that appropriately joins
the inputs.

An issue with this setup is that for every row coming from the left branch, there may be one
or more rows (that are the output of the UDTF) coming from the right branch. The single row
from the left branch and the multiple rows from the right branch must be associated in some
way. The proposed solution is to depend on the positioning of the operators. If the operator
DAG is setup as previously mentioned, then the lateral view join operator should first see
a row from the left branch, followed by rows from the right branch, followed by a single row
from the left branch again, and so on. Hence, the reception of a row from the left branch
can be used as a delimiter.

For multiple lateral views, additional branches will be added from the table scan operator
that include the appropriate select and UDTF operators. A recursive approach was considered,
but may present issues with generating a Cartesian product

4. With the proposed semantics and operator DAG, the definitions of GenericUDTF. {process(),
close()} must be adjusted slightly. GenericUDTF.close() should not output additional rows
and GenericUDTF.process() should make all the calls to GenericUDTF.forward() that are necessary
for the given row.


> support LATERAL VIEW
> --------------------
>
>                 Key: HIVE-935
>                 URL: https://issues.apache.org/jira/browse/HIVE-935
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Paul Yang
>
> For table functions, the following syntax should be supported
> select ... from T LATERAL VIEW explode(adid_list) as adid,...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message