hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahar Glixman <sglix...@outbrain.com>
Subject GenericUDFRank UDF is not working as expected
Date Tue, 23 Jul 2013 22:17:51 GMT
Hello,

I'm trying to use GenericUDFRank described in:
https://issues.apache.org/jira/browse/HIVE-2361, however, no matter
 the query I use, the result is not what I expected.
Assume a user hive table with the format:
Country, City, userId

I'm running the following query:

ADD JAR Rank.jar;
CREATE TEMPORARY FUNCTION rank AS
'com.nexr.platform.analysis.udf.GenericUDFRank';

SELECT
  Country,
  City,
  rank(userId)

FROM
  myTable

DISTRIBUTE BY
  Country,
  City

SORT BY
  Country,
  City
  userId;

For the following table:
US NY 8
US NY 12
US NY 3
US NJ 10
US NJ 26

I'm expecting the following result:
US NY 1
US NY 2
US NY 3
US NJ 1
US NJ 2

But I get:
US NY 1
US NY 1
US NY 1
US NJ 1
US NJ 1

I used also a different rank implementation (
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive)
but results
 were similar. I guess I'm using the UDF the wrong way, but I cant find the
correct way.
Any help is appreciated.

thanks

-- 
The above terms reflect a potential business arrangement, are provided solely 
as a basis for further discussion, and are not intended to be and do not 
constitute a legally binding obligation. No legally binding obligations will 
be created, implied, or inferred until an agreement in final form is executed 
in writing by all parties involved.

This email and any attachments hereto may be confidential or privileged. 
 If you received this communication by mistake, please don't forward it to 
anyone else, please erase all copies and attachments, and please let me 
know that it has gone to the wrong person. Thanks.

Mime
View raw message