hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/UDF" by PhiloVivero
Date Tue, 03 May 2011 23:43:43 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/UDF" page has been changed by PhiloVivero.
The comment on this change is: Thanks to Andrew Done for this cool workaround..
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=diff&rev1=64&rev2=65

--------------------------------------------------

  
  == GROUPing and SORTing on f(column) ==
  
- A typical OLAP pattern is that you have a timestamp column and you want to group by daily
or other less granular date windows than by second. So you might want to select concat(year(dt),month(dt))
and then group on that concat(). But if you attempt to GROUP BY or SORT BY a column on which
you've applied a function, like this:
+ A typical OLAP pattern is that you have a timestamp column and you want to group by daily
or other less granular date windows than by second. So you might want to select concat(year(dt),month(dt))
and then group on that concat(). But if you attempt to GROUP BY or SORT BY a column on which
you've applied a function and alias, like this:
  
  {{{
  select f(col) as fc, count(*) from table_name group by fc
@@ -338, +338 @@

  FAILED: Error in semantic analysis: line 1:69 Invalid Table Alias or Column Reference fc
  }}}
  
- Because you are not able to GROUP BY or SORT BY a column on which a function has been applied.
However, you can reformulate this query with subqueries:
+ Because you are not able to GROUP BY or SORT BY a column alias on which a function has been
applied. There are two workarounds. First, you can reformulate this query with subqueries,
which is somewhat complicated:
- 
- {{{
- select sq.fc,count(*) from (select f(col) as fc from table_name) sq group by sq.fc
- }}}
- 
- You will have to specify all the columns you want along with the f(col) in both the subquery
and the outside (which is obvious on retrospect). The general formula for the f(col) reformulation
is:
  
  {{{
  select sq.fc,col1,col2,...,colN,count(*) from
@@ -352, +346 @@

   group by sq.fc,col1,col2,...,colN
  }}}
  
+ Or you can make sure not to use a column alias, which is simpler:
+ 
+ {{{
+ select f(col) as fc, count(*) from table_name group by f(col)
+ }}}
+ 
  Contact Tim Ellis (tellis) at RiotGames dot com if you would like to discuss this in further
detail.
  

Mime
View raw message