spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-7285) Audit missing Hive functions
Date Fri, 01 May 2015 07:17:06 GMT

     [ https://issues.apache.org/jira/browse/SPARK-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin updated SPARK-7285:
-------------------------------
    Description: 
Create a list of functions that is on this page but not in SQL/DataFrame.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Here's the list of missing stuff:


basic

between
bitwise operation
bitwiseAND
bitwiseOR
bitwiseXOR
bitwiseNOT

math

round(DOUBLE a)
round(DOUBLE a, INT d) Returns a rounded to d decimal places.
log2
sqrt(string column name)
bin
hex(long), hex(string), hex(binary)
unhex(string) -> binary
conv
pmod
factorial
toDeg  -> toDegrees
toRad -> toRadians
e()
pi()
shiftleft(int or long)
shiftright(int or long)
shiftrightunsigned(int or long)

collection functions

sort_array(array)
size(map, array)
map_values(map<k,v>): array<v>
map_keys(map<k,v>):array<k>
array_contains(array<t>, value): boolean

date functions

from_unixtime(long, string): string
unix_timestamp(): long
unix_timestamp(date): long

year(date): int
month(date): int
day(date): int
dayofmonth(date); int
hour(timestamp): int
minute(timestamp): int
second(timestamp): int
weekofyear(date): int
date_add(date, int)
date_sub(date, int)
from_utc_timestamp(timestamp, string timezone): timestamp
current_date(): date
current_timestamp(): timestamp
add_months(string start_date, int num_months): string
last_day(string date): string
next_day(string start_date, string day_of_week): string
trunc(string date[, string format]): string
months_between(date1, date2): double
date_format(date/timestamp/string ts, string fmt): String


conditional functions

if(boolean testCondition, T valueTrue, T valueFalseOrNull): T
nvl(T value, T default_value): T
greatest(T v1, T v2, …): T
least(T v1, T v2, …): T


string functions

ascii(string str): int
base64(binary): string
concat(string|binary A, string|binary B…): string | binary
concat_ws(string SEP, string A, string B…): string
concat_ws(string SEP, array<string>): string
decode(binary bin, string charset): string
encode(string src, string charset): binary
find_in_set(string str, string strList): int
format_number(number x, int d): string
length(string): int
instr(string str, string substr): int
locate(string substr, string str[, int pos]): int
lower(string), lcase(string)
lpad(string str, int len, string pad): string
ltrim(string): string

parse_url(string urlString, string partToExtract [, string keyToExtract]): string
printf(String format, Obj... args): string
regexp_extract(string subject, string pattern, int index): string
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT): string
repeat(string str, int n): string
reverse(string A): string
rpad(string str, int len, string pad): string
space(int n): string
split(string str, string pat): array
str_to_map(text[, delimiter1, delimiter2]): map<string, string>
trim(string A): string
unbase64(string str): binary
upper(string A) ucase(string A): string
levenshtein(string A, string B: int
soundex(string A): string


Misc

hash(a1[, a2…]): int


text

context_ngrams(array<array<string>>, array<string>, int K, int pf): array<struct<string,double>>
ngrams(array<array<string>>, int N, int K, int pf): array<struct<string,double>>
sentences(string str, string lang, string locale): array<array<string>>


*UDAF*

var_samp
stddev_pop
stddev_samp
covar_pop
covar_samp
corr
percentile: array<double>
percentile_approx: array<double>
histogram_numeric: array<struct {'x','y'}>
collect_set  <— we have hashset
collect_list 
ntile








  was:
Create a list of functions that is on this page but not in SQL/DataFrame.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF




> Audit missing Hive functions
> ----------------------------
>
>                 Key: SPARK-7285
>                 URL: https://issues.apache.org/jira/browse/SPARK-7285
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>
> Create a list of functions that is on this page but not in SQL/DataFrame.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
> Here's the list of missing stuff:
> basic
> between
> bitwise operation
> bitwiseAND
> bitwiseOR
> bitwiseXOR
> bitwiseNOT
> math
> round(DOUBLE a)
> round(DOUBLE a, INT d) Returns a rounded to d decimal places.
> log2
> sqrt(string column name)
> bin
> hex(long), hex(string), hex(binary)
> unhex(string) -> binary
> conv
> pmod
> factorial
> toDeg  -> toDegrees
> toRad -> toRadians
> e()
> pi()
> shiftleft(int or long)
> shiftright(int or long)
> shiftrightunsigned(int or long)
> collection functions
> sort_array(array)
> size(map, array)
> map_values(map<k,v>): array<v>
> map_keys(map<k,v>):array<k>
> array_contains(array<t>, value): boolean
> date functions
> from_unixtime(long, string): string
> unix_timestamp(): long
> unix_timestamp(date): long
> year(date): int
> month(date): int
> day(date): int
> dayofmonth(date); int
> hour(timestamp): int
> minute(timestamp): int
> second(timestamp): int
> weekofyear(date): int
> date_add(date, int)
> date_sub(date, int)
> from_utc_timestamp(timestamp, string timezone): timestamp
> current_date(): date
> current_timestamp(): timestamp
> add_months(string start_date, int num_months): string
> last_day(string date): string
> next_day(string start_date, string day_of_week): string
> trunc(string date[, string format]): string
> months_between(date1, date2): double
> date_format(date/timestamp/string ts, string fmt): String
> conditional functions
> if(boolean testCondition, T valueTrue, T valueFalseOrNull): T
> nvl(T value, T default_value): T
> greatest(T v1, T v2, …): T
> least(T v1, T v2, …): T
> string functions
> ascii(string str): int
> base64(binary): string
> concat(string|binary A, string|binary B…): string | binary
> concat_ws(string SEP, string A, string B…): string
> concat_ws(string SEP, array<string>): string
> decode(binary bin, string charset): string
> encode(string src, string charset): binary
> find_in_set(string str, string strList): int
> format_number(number x, int d): string
> length(string): int
> instr(string str, string substr): int
> locate(string substr, string str[, int pos]): int
> lower(string), lcase(string)
> lpad(string str, int len, string pad): string
> ltrim(string): string
> parse_url(string urlString, string partToExtract [, string keyToExtract]): string
> printf(String format, Obj... args): string
> regexp_extract(string subject, string pattern, int index): string
> regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT): string
> repeat(string str, int n): string
> reverse(string A): string
> rpad(string str, int len, string pad): string
> space(int n): string
> split(string str, string pat): array
> str_to_map(text[, delimiter1, delimiter2]): map<string, string>
> trim(string A): string
> unbase64(string str): binary
> upper(string A) ucase(string A): string
> levenshtein(string A, string B: int
> soundex(string A): string
> Misc
> hash(a1[, a2…]): int
> text
> context_ngrams(array<array<string>>, array<string>, int K, int pf):
array<struct<string,double>>
> ngrams(array<array<string>>, int N, int K, int pf): array<struct<string,double>>
> sentences(string str, string lang, string locale): array<array<string>>
> *UDAF*
> var_samp
> stddev_pop
> stddev_samp
> covar_pop
> covar_samp
> corr
> percentile: array<double>
> percentile_approx: array<double>
> histogram_numeric: array<struct {'x','y'}>
> collect_set  <— we have hashset
> collect_list 
> ntile



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message