hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/HiveQL/UDF" by hliu
Date Thu, 22 Jan 2009 01:38:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by hliu:
http://wiki.apache.org/hadoop/Hive/HiveQL/UDF

------------------------------------------------------------------------------
  [[TableOfContents]]
  
+ Built in operators and functions are supported in Hive.
+ 
- == Built in operators ==
+ == Built in Operators ==
  === Relational Operators ===
  The following operators compare the passed operands and generate a TRUE or FALSE value depending
on whether the comparison between the operands holds or not. 
  ||<10%>'''Operator'''||<10%>'''Operand types'''||'''Description'''||
@@ -50, +52 @@

  ||M[key] ||M is a Map<K, V> and key has type K ||returns the value corresponding to
the key in the map e.g. if M is a map comprising of {'f' -> 'foo', 'b' -> 'bar', 'all'
-> 'foobar'} then M['all'] returns 'foobar'||
  ||S.x ||S is a struct ||returns the x field of S e.g for struct foobar {int foo, int bar}
foobar.foo returns the integer stored in the foo field of the struct. ||
  
+ == Built in Functions ==
- === Generic built in Functions ===
+ === Generic Built in Functions ===
  The following are built in functions are supported in hive: 
- ||<10%>'''Return Type'''||<10%>'''Name(Signature)'''||'''Description||
+ ||<8%>'''Return Type'''||<15%>'''Name(Signature)'''||'''Description||
  ||BIGINT ||round(double a) ||returns the rounded BIGINT value of the double||
  ||BIGINT ||floor(double a) ||returns the maximum BIGINT value that is equal or less than
the double||
  ||BIGINT ||ceil(double a) ||returns the minimum BIGINT value that is equal or greater than
the double||
@@ -75, +78 @@

  ||int ||year(string date) ||Return the year part of a date or a timestamp string: year("1970-01-01
00:00:00") = 1970, year("1970-01-01") = 1970||
  ||int ||month(string date) ||Return the month part of a date or a timestamp string: month("1970-11-01
00:00:00") = 11, month("1970-11-01") = 11||
  ||int ||day(string date) ||Return the day part of a date or a timestamp string: day("1970-11-01
00:00:00") = 1, day("1970-11-01") = 1 ||
+ ||string ||get_json_object(string json_string, string path) ||Extract json object from a
json string based on json path specified, and return json string of the extracted json object.
It will return null if the input json string is invalid||
+ 
+ ==== get_json_object ====
+ A limited version of JSONPath supported:
+    *    $   : Root object
+    *    .   : Child operator
+    *    []  : Subscript operator for array
+    *    *   : Wildcard for []
+ Syntax not supported that's worth noticing:
+    *    ''  : Zero length string as key
+    *    ..  : Recursive descent
+    *    @   : Current object/element
+    *    ()  : Script expression
+    *    ?() : Filter (script) expression.
+    *    [,] : Union operator
+    *    [start:end:step] : array slice operator
+ Example
+ src_json table is a single column (json), single row table:
+ {{{
+ +-----------------------------------------------------------------------------------------------------------------------------------------+
+                                json
+ +-----------------------------------------------------------------------------------------------------------------------------------------+
+ {"store":                                                                              
                                                 
+   {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],                   
                                                  
+    "book":[{"author":"Nigel Rees","category":"reference","title":"Sayings of the Century","price":8.95},
+            {"author":"Herman Melville","category":"fiction","title":"Moby Dick","price":8.99,"isbn":"0-553-21311-3"},
+            {"author":"J. R. R. Tolkien","category":"fiction","title":"The Lord of the Rings","price":22.99,"isbn":"0-395-19395-8"}
+           ],
+    "bicycle":{"price":19.95,"color":"red"}
+   },
+  "email":"amy@only_for_json_udf_test.net",
+  "owner":"amy"
+ 
+ +-----------------------------------------------------------------------------------------------------------------------------------------+
+ }}}
+ {{{
+ SELECT get_json_object(src_json.json, '$') FROM src_json;
+ 
+ SELECT get_json_object(src_json.json, '$.owner') FROM src_json; 
+ 
+ SELECT get_json_object(src_json.json, '$.store.book[0]') FROM src_json;
+ 
+ SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_json;
+ }}}
+ {{{
+ {"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"book":[{"author":"Nigel
Rees","category":"reference","title":"Sayings of the Century","price":8.95},{"author":"Herman
Melville","category":"fiction","title":"Moby Dick","price":8.99,"isbn":"0-553-21311-3"},{"author":"J.
R. R. Tolkien","category":"fiction","title":"The Lord of the Rings","price":22.99,"reader":[{"name":"bob","age":25},{"name":"jack","age":26}],"isbn":"0-395-19395-8"}],"basket":[[1,2,{"b":"y","a":"x"}],[3,4],[5,6]],"bicycle":{"price":19.95,"color":"red"}},"email":"amy@only_for_json_udf_test.net","owner":"amy"}
+ 
+ amy
+ 
+ {"author":"Nigel Rees","category":"reference","title":"Sayings of the Century","price":8.95}
   
+ 
+ NULL
+ }}}
  
  
  === Built in Aggregate (GROUP BY) Functions ===
@@ -86, +142 @@

  ||DOUBLE ||min(col) ||returns the minimum of the column in the group||
  ||DOUBLE ||max(col) ||returns the maximum value of the column n the group||
  
- 
- 
- 
- 
- 
- == UDF supported by Hive ==
- UDF are built in functions in Hive. Hive supports two type of UDFs: generic built in functions,
and built in aggregate (GROUP BY) functions.
- 
- 
- === Generic built in Functions ===
- 
- ==== count ====
- Returns the number of members in the group, whereas the count(DISTINCT col) gets the count
of distinct values of the columns in the group. The result is a BIGINT value. The DISTINCT
option can be used to return the average of the distinct values of col.
- {{{
- count(1), count(DISTINCT col [, col]...) 
- }}}
- 
- 
- === Built in Aggregate (GROUP BY) Functions ===
  
  ==== count ====
  Returns the number of members in the group, whereas the count(DISTINCT col) gets the count
of distinct values of the columns in the group. The result is a BIGINT value. The DISTINCT
option can be used to return the average of the distinct values of col.

Mime
View raw message