hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/LateralView" by PaulYang
Date Tue, 05 Jan 2010 01:03:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/LateralView" page has been changed by PaulYang.
http://wiki.apache.org/hadoop/Hive/LanguageManual/LateralView

--------------------------------------------------

New page:
<<TableOfContents>>

== Lateral View Synatx ==


{{{

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*

}}}


== Description ==

Lateral view is used in conjunction with user-defined table generating functions such as explode().
As mentioned in [wiki:Self:Hive/LanguageManual/UDF#UDTF], a UDTF generates one or more rows
for each input row. A lateral view constructs a virtual table by applying the UDTF to each
row of base table and using the resulting output rows to form a virtual table having the supplied
table alias. In addition, each input row from the base table is joined with the respective
output rows in the virtual table.

== Example ==

Consider the following base table named pageAds. It has two columns: pageid (name of the page)
and adid_list (an array of ads appearing on the page):

|| string pageid || Array<int> adid_list ||
|| "front_page" || [1, 2, 3] ||
|| "contact_page" || [3, 5, 6] ||

and the user would like to count the total number of times an ad appears across all pages.

A lateral view with explode() can be used to convert adid_list into separate rows using the
query:
{{{
SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
}}}

The resulting output will be

|| string pageid || int adid ||
|| "front_page" || 1 ||
|| "front_page" || 2 ||
|| "front_page" || 3 ||
|| "contact_page" || 3 ||
|| "contact_page" || 4 ||
|| "contact_page" || 5 ||

Then in order to count the number of times a particular ad appears, count/group by can be
used
{{{
SELECT adid, count(1) FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid GROUP BY
adid;
}}}

|| int adid || count(1) ||
|| 1 || 1 ||
|| 2 || 1 ||
|| 3 || 2 ||
|| 4 || 1 ||
|| 5 || 1 ||

== Multiple Lateral Views ==

A FROM clause can have multiple LATERAL VIEW clauses. Subsequent LATERAL VIEWS can reference
columns from any of the tables appearing to the left of the LATERAL VIEW.

For example, the following could be a valid query:
{{{
SELECT * FROM exampleTable LATERAL VIEW explode(col1) myTable1 AS myCol1 LATERAL VIEW explode(myCol1)
myTable2 AS myCol2;
}}} 

LATERAL VIEW clauses are applied in the order that they appear. For example with the following
base table:

|| Array<int> col1 || Array<string> col2 ||
|| [1, 2] || ["a", "b", "c"] ||
|| [3, 4] || ["d", "e", "f"] ||

The query:

{{{
SELECT myCol1, col2 FROM baseTable LATERAL VIEW explode(col1) myTable1 AS myCol1;
}}}

Will produce:

|| int mycol1 || Array<string> col2 ||
|| 1 || ["a", "b", "c"] ||
|| 2 || ["a", "b", "c"] ||
|| 3 || ["d", "e", "f"] ||
|| 4 || ["d", "e", "f"] ||

The query (similar to above but with an additional LATERAL VIEW):

{{{
SELECT myCol1, myCol2 FROM baseTable LATERAL VIEW explode(col1) myTable1 AS myCol1 LATERAL
VIEW explode(col2) myTable2 AS myCol2;
}}}

Will produce:

||int myCol1 || string myCol2 ||
|| 1 || "a" ||
|| 1 || "b" ||
|| 1 || "c" ||
|| 2 || "a" ||
|| 2 || "b" ||
|| 2 || "c" ||
|| 3 || "d" ||
|| 3 || "e" ||
|| 3 || "f" ||
|| 4 || "d" ||
|| 4 || "e" ||
|| 4 || "f" ||

Mime
View raw message