hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/UDF" by DexterFryar
Date Wed, 22 Jun 2011 15:38:16 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/UDF" page has been changed by DexterFryar:
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=diff&rev1=68&rev2=69

Comment:
Fixed regexp_extract 3rd parameter was misspelled and there was no description as to what
'index' meant.

  ||string ||ltrim(string A) ||Returns the string resulting from trimming spaces from the
beginning(left hand side) of A e.g. ltrim(' foobar ') results in 'foobar ' ||
  ||string ||rtrim(string A) ||Returns the string resulting from trimming spaces from the
end(right hand side) of A e.g. rtrim(' foobar ') results in ' foobar' ||
  ||string ||regexp_replace(string A, string B, string C) ||Returns the string resulting from
replacing all substrings in B that match the Java regular expression syntax(See Java regular
expressions syntax) with C e.g. regexp_replace("foobar", "oo|ar", "") returns 'fb.' Note that
some care is necessary in using predefined character classes: using '\s' as the second argument
will match the letter s; '\\s' is necessary to match whitespace, etc. ||
- ||string ||regexp_extract(string subject, string pattern, int intex) ||Returns the string
extracted using the pattern. e.g. regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns
'bar.' Note that some care is necessary in using predefined character classes: using '\s'
as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc.
||
+ ||string ||regexp_extract(string subject, string pattern, int index) ||Returns the string
extracted using the pattern. e.g. regexp_extract('foothebar', 'foo(.*?)(bar)', 2) returns
'bar.' Note that some care is necessary in using predefined character classes: using '\s'
as the second argument will match the letter s; '\\s' is necessary to match whitespace, etc.
 The 'index' parameter is the Java regex Matcher group() method index. See docs/api/java/util/regex/Matcher.html
for more information on the 'index' or Java regex group() method.||
  ||string ||parse_url(string urlString, string partToExtract [, string keyToExtract]) ||Returns
the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY,
REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. e.g. parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1',
'HOST')  returns 'facebook.com'. Also a value of a particular key in QUERY can be extracted
by providing the key as the third argument, e.g. parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1',
'QUERY', 'k1')  returns 'v1'. ||
  ||string ||get_json_object(string json_string, string path) ||Extract json object from a
json string based on json path specified, and return json string of the extracted json object.
It will return null if the input json string is invalid. '''NOTE: The json path can only have
the characters [0-9a-z_], i.e., no upper-case or special characters. Also, the keys *cannot*
start with numbers.''' This is due to restrictions on Hive column names. ||
  ||string ||space(int n) ||Return a string of n spaces ||

Mime
View raw message