Return-Path: Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: (qmail 96817 invoked from network); 8 Mar 2010 05:49:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Mar 2010 05:49:10 -0000 Received: (qmail 68079 invoked by uid 500); 8 Mar 2010 05:48:48 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 67806 invoked by uid 500); 8 Mar 2010 05:48:45 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 67799 invoked by uid 500); 8 Mar 2010 05:48:44 -0000 Delivered-To: apmail-hadoop-core-commits@hadoop.apache.org Received: (qmail 67795 invoked by uid 99); 8 Mar 2010 05:48:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 05:48:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 05:48:38 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id C417C16E29; Mon, 8 Mar 2010 05:48:17 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Mon, 08 Mar 2010 05:48:17 -0000 Message-ID: <20100308054817.25421.24690@eos.apache.org> Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22Hive/LanguageManual/UDF=22_by_Ad?= =?utf-8?q?amKramer?= Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch= ange notification. The "Hive/LanguageManual/UDF" page has been changed by AdamKramer. The comment on this change is: Adds a caveat that has been messing me up; a= lso corrects two typoes.. http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=3Ddiff&rev1=3D= 34&rev2=3D35 -------------------------------------------------- <> = NOTE: These commands will be able to show you the latest function documen= tations. + = {{{ SHOW FUNCTIONS; DESCRIBE FUNCTION ; }}} - = =3D=3D Built-in Operators =3D=3D =3D=3D=3D Relational Operators =3D=3D=3D The following operators compare the passed operands and generate a TRUE o= r FALSE value depending on whether the comparison between the operands hold= s. ||<10%>'''Operator''' ||<10%>'''Operand types''' ||'''Description''' || ||A =3D B ||All primitive types ||TRUE if expression A is equal to expres= sion B otherwise FALSE || - ||A =3D=3D B ||None! || Fails because of invalid syntax. SQL uses =3D, no= t =3D=3D || + ||A =3D=3D B ||None! ||Fails because of invalid syntax. SQL uses =3D, not= =3D=3D || ||A <> B ||All primitive types ||TRUE if expression A is NOT equal to exp= ression B otherwise FALSE || ||A < B ||All primitive types ||TRUE if expression A is less than express= ion B otherwise FALSE || ||A <=3D B ||All primitive types ||TRUE if expression A is less than or e= qual to expression B otherwise FALSE || @@ -60, +60 @@ = =3D=3D=3D Operators on Complex Types =3D=3D=3D The following operators provide mechanisms to access elements in Complex= Types - ||<10%> '''Operator''' ||<10%> '''Operand types''' || '''Description''' || + ||<10%>'''Operator''' ||<10%>'''Operand types''' ||'''Description''' || ||A[n] ||A is an Array and n is an int ||Returns the nth element in the a= rray A. The first element has index 0 e.g. if A is an array comprising of [= 'foo', 'bar'] then A[0] returns 'foo' and A[1] returns 'bar' || ||M[key] ||M is a Map and key has type K ||Returns the value corres= ponding to the key in the map e.g. if M is a map comprising of {'f' -> 'foo= ', 'b' -> 'bar', 'all' -> 'foobar'} then M['all'] returns 'foobar' || ||S.x ||S is a struct ||Returns the x field of S. e.g for struct foobar {= int foo, int bar} foobar.foo returns the integer stored in the foo field of= the struct. || @@ -71, +71 @@ =3D=3D Built-in Functions =3D=3D =3D=3D=3D Mathematical Functions =3D=3D=3D The following built-in mathematical functions are supported in hive: - ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description= ''' || + ||<10%>'''Return Type''' ||<15%>'''Name(Signature)''' ||'''Description'''= || ||BIGINT ||round(double a) ||Returns the rounded BIGINT value of the doub= le || ||BIGINT ||floor(double a) ||Returns the maximum BIGINT value that is equ= al or less than the double || ||BIGINT ||ceil(double a), ceiling(double a) ||Returns the minimum BIGINT= value that is equal or greater than the double || @@ -101, +101 @@ = =3D=3D=3D Collection Functions =3D=3D=3D The following built-in collection functions are supported in hive: - ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description= ''' || + ||<10%>'''Return Type''' ||<15%>'''Name(Signature)''' ||'''Description'''= || ||int ||size(Map) ||Returns the number of elements in the map type || ||int ||size(Array) ||Returns the number of elements in the array type= || = @@ -110, +110 @@ = =3D=3D=3D Type Conversion Functions =3D=3D=3D The following type conversion functions are supported in hive: - ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description= ''' || + ||<10%>'''Return Type''' ||<15%>'''Name(Signature)''' ||'''Description'''= || - || ||cast(expr as ) ||Converts the results of the expression= expr to e.g. cast('1' as BIGINT) will convert the string '1' to it = integral representation. A null is returned if the conversion does not succ= eed. || + ||'''Expected "=3D" to follow "type"''' ||cast(expr as ) ||Converts= the results of the expression expr to e.g. cast('1' as BIGINT) will= convert the string '1' to it integral representation. A null is returned i= f the conversion does not succeed. || = = = @@ -146, +146 @@ = =3D=3D=3D String Functions =3D=3D=3D The following are built-in String functions are supported in hive: - ||<10%> '''Return Type''' ||<15%> '''Name(Signature)''' || '''Description= ''' || + ||<10%>'''Return Type''' ||<15%>'''Name(Signature)''' ||'''Description'''= || ||int ||length(string A) ||Returns the length of the string || ||string ||reverse(string A) ||Returns the reversed string || ||string ||concat(string A, string B...) ||Returns the string resulting f= rom concatenating the strings passed in as parameters in order. e.g. concat= ('foo', 'bar') results in 'foobar'. Note that this function can take any nu= mber of input strings. || @@ -157, +157 @@ ||string ||trim(string A) ||Returns the string resulting from trimming sp= aces from both ends of A e.g. trim(' foobar ') results in 'foobar' || ||string ||ltrim(string A) ||Returns the string resulting from trimming s= paces from the beginning(left hand side) of A e.g. ltrim(' foobar ') result= s in 'foobar ' || ||string ||rtrim(string A) ||Returns the string resulting from trimming s= paces from the end(right hand side) of A e.g. rtrim(' foobar ') results in = ' foobar' || - ||string ||regexp_replace(string A, string B, string C) ||Returns the str= ing resulting from replacing all substrings in B that match the Java regula= r expression syntax(See Java regular expressions syntax) with C e.g. regexp= _replace('foobar', 'oo|ar', '') returns 'fb' || - ||string ||regexp_extract(string subject, string pattern, int intex) ||Re= turns the string extracted using the pattern. e.g. regexp_extract('footheba= r', 'foo(.*?)(bar)', 2) returns 'bar' || + ||string ||regexp_replace(string A, string B, string C) ||Returns the str= ing resulting from replacing all substrings in B that match the Java regula= r expression syntax(See Java regular expressions syntax) with C e.g. regexp= _replace("foobar", "oo|ar", "") returns 'fb.' Note that some care is necess= ary in using predefined character classes: using '\s' as the second argumen= t will match the letter s; '\\s' is necessary to match whitespace, etc. || + ||string ||regexp_extract(string subject, string pattern, int intex) ||Re= turns the string extracted using the pattern. e.g. regexp_extract('footheba= r', 'foo(.*?)(bar)', 2) returns 'bar.' Note that some care is necessary in = using predefined character classes: using '\s' as the second argument will = match the letter s; '\\s' is necessary to match whitespace, etc. || ||string ||parse_url(string urlString, string partToExtract) ||Returns th= e specified part from the URL. Valid values for partToExtract include HOST,= PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. e.g. parse_url(= 'http://facebook.com/path1/p.php?k1=3Dv1&k2=3Dv2#Ref1', 'HOST') returns 'f= acebook.com' Also a value of particular key in QUERY can be extracted using= the syntax QUERY: e.g. QUERY:k1. || ||string ||get_json_object(string json_string, string path) ||Extract jso= n object from a json string based on json path specified, and return json s= tring of the extracted json object. It will return null if the input json s= tring is invalid || ||string ||space(int n) ||Return a string of n spaces || @@ -167, +167 @@ ||string ||lpad(string str, int len, string pad) ||Returns str, left-padd= ed with pad to a length of len || ||string ||rpad(string str, int len, string pad) ||Returns str, right-pad= ded with pad to a length of len || ||list ||split(string str, string pat) ||Split str around pat (pat is a r= egular expression) || - ||int ||find_in_set(string str, string strList) ||Returns the first occur= ance of str in strList where strList is a comman-delimited string. Returns = null if either argument is null. Returns 0 if the first argument contains a= ny commas. e.g. find_in_set('ab', 'abc,b,ab,c,def') returns 3 || + ||int ||find_in_set(string str, string strList) ||Returns the first occur= ance of str in strList where strList is a comma-delimited string. Returns n= ull if either argument is null. Returns 0 if the first argument contains an= y commas. e.g. find_in_set('ab', 'abc,b,ab,c,def') returns 3 || + = + = = = =3D=3D=3D Misc. Functions =3D=3D=3D - = =3D=3D=3D=3D get_json_object =3D=3D=3D=3D A limited version of JSONPath is supported: = @@ -217, +218 @@ hive> SELECT get_json_object(src_json.json, '$.non_exist_key') FROM src_j= son; NULL }}} - = =3D=3D Built-in Aggregate Functions (UDAF) =3D=3D The following are built-in aggregate functions are supported in Hive: - ||<10%> '''Return Type''' ||<10%> '''Name(Signature)''' || '''Description= ''' || + ||<10%>Return Type''' ''' ||<10%>Name(Signature)''' ''' ||Description''' = ''' || ||BIGINT ||count(1), count(DISTINCT col [, col]...) ||count(1) returns th= e number of members in the group, whereas the count(DISTINCT col) gets the = count of distinct values of the columns in the group || - ||double||sum(col), sum(DISTINCT col) ||Returns the sum of the elements i= n the group or the sum of the distinct values of the column in the group || + ||double ||sum(col), sum(DISTINCT col) ||Returns the sum of the elements = in the group or the sum of the distinct values of the column in the group || ||double ||avg(col), avg(DISTINCT col) ||Returns the average of the eleme= nts in the group or the average of the distinct values of the column in the= group || ||double ||min(col) ||Returns the minimum of the column in the group || ||double ||max(col) ||Returns the maximum value of the column n the group= || = + = + = + = =3D=3D Built-in Table-Generating Functions (UDTF) =3D=3D - <> - Normal user-defined functions, such as concat(), take in a single input r= ow and output a single output row. In contrast, table-generating functions = transform a single input row to multiple output rows. Currently, the only t= able-generating function is explode(), which takes in an array as an input = and outputs the elements of the array as separate rows. UDTF's can be used = in the SELECT expression list and as a part of LATERAL VIEW. + <> Normal user-defined functions, such as concat(), take in= a single input row and output a single output row. In contrast, table-gene= rating functions transform a single input row to multiple output rows. Curr= ently, the only table-generating function is explode(), which takes in an a= rray as an input and outputs the elements of the array as separate rows. UD= TF's can be used in the SELECT expression list and as a part of LATERAL VIE= W. = An example use of explode() in the SELECT expression list is as follows: = Consider a table named myTable that has a single column (myCol) and two r= ows: - = - || Array myCol || + ||Array myCol || - || [1,2,3] || + ||[1,2,3] || - || [4,5,6] || + ||[4,5,6] || + = + = + = = Then running the query: = SELECT explode(myCol) AS myNewCol FROM myTable; = Will produce: - = - || int myNewCol || + ||int myNewCol || - || 1 || + ||1 || - || 2 || + ||2 || - || 3 || + ||3 || - || 4 || + ||4 || - || 5 || + ||5 || - || 6 || + ||6 || + = + = + = = Using the syntax "SELECT udtf(col) AS colAlias..." has a few limitations: = - * No other expressions are allowed in SELECT = + * No other expressions are allowed in SELECT - * SELECT pageid, explode(adid_list) AS myCol... is not supported + * SELECT pageid, explode(adid_list) AS myCol... is not supported - * UDTF's can't be nested + * UDTF's can't be nested - * SELECT explode(explode(adid_list)) AS myCol... is not supported + * SELECT explode(explode(adid_list)) AS myCol... is not supported - * GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported + * GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported - * SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not suppor= ted + * SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported = Please see [wiki:Self:Hive/LanguageManual/LateralView] for an alternative= syntax that does not have these limitations. - = + = The following are built-in table-generating functions are supported in Hi= ve: - ||<10%> '''Return Type''' ||<10%> '''Name(Signature)''' || '''Description= ''' || + ||<10%>Return Type''' ''' ||<10%>Name(Signature)''' ''' ||Description''' = ''' || - || || explode(array a) <> || For each eleme= nt in a, explode() generates a row containing that element || + ||'''Expected "=3D" to follow "type"''' ||explode(array a) <> ||For each element in a, explode() generates a row containing t= hat element || = + = +=20