hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/Transform" by AdamKramer
Date Thu, 15 Jul 2010 01:17:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/Transform" page has been changed by AdamKramer.
http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform?action=diff&rev1=19&rev2=20

--------------------------------------------------

  
  Users can also plug in their own custom mappers and reducers in the data stream by using
features natively supported in the Hive 2.0 language. e.g. in order to run a custom mapper
script - map_script - and a custom reducer script - reduce_script - the user can issue the
following command which uses the TRANSFORM clause to embed the mapper and the reducer scripts.
  
- By default, columns will be transformed to ''STRING'' and delimited by TAB before feeding
to the user script, and the standard output of the user script will be treated as TAB-separated
''STRING'' columns. User scripts can output debug information to standard error which will
be shown on the task detail page on hadoop. These defaults can be overridden with ''ROW FORMAT''...
+ By default, columns will be transformed to ''STRING'' and delimited by TAB before feeding
to the user script; similarly, all NULL values will be converted to the literal string '''\N'''
in order to differentiate NULL values from empty strings. The standard output of the user
script will be treated as TAB-separated ''STRING'' columns, any cell containing only '''\N'''
will be re-interpreted as a NULL, and then the resulting STRING column will be cast to the
data type specified in the table declaration in the usual way. User scripts can output debug
information to standard error which will be shown on the task detail page on hadoop. These
defaults can be overridden with ''ROW FORMAT''...
  
  In the syntax, both ''MAP ...'' and ''REDUCE ...'' can be also written as ''SELECT TRANSFORM
( ... )''.  There are actually no difference between these three.
  Hive runs the reduce script in the reduce task (instead of the map task) because of the
''clusterBy''/''distributeBy''/''sortBy'' clause in the inner query.

Mime
View raw message