hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Sichi <jsi...@facebook.com>
Subject Re: rownum, row_number() or looping ability with hiveql?
Date Tue, 25 May 2010 19:20:46 GMT
In your simple example, you can probably use the LIMIT clause, but for more advanced cases,
here's a patch for a ROW_SEQUENCE UDF (not committed to trunk yet):

https://issues.apache.org/jira/browse/HIVE-1304

The caveat is that since we don't actually have a SQL/OLAP implementation yet, you have to
use ORDER BY at the nested query level (rather than the OVER clause level where it belongs)
and cross your fingers.

JVS

On May 25, 2010, at 12:13 PM, Kortni Smith wrote:

Hi,

Is there a hive equivalent to Oracle’s rownum, row_number() or the ability to loop through
a resultset?

I have been struggling to create a hive query that will give me max X records, per something,
when sorted by something.  For example, I have book data, multiple records for any given isbn,
and want the lowest 5 priced books per isbn.

I can accomplish this in oracle with the following:

select isbn, price from
(
select isbn, price, row_number() over (partition by isbn order by price asc) rn
from kstest
)
where rn <= 5;

Any  ideas would be greatly appreciated.
Thank you,

Kortni Smith | Software Developer
AbeBooks.com  <http://www.abebooks.com/> Passion for books.

ksmith@abebooks.com<mailto:ksmith@abebooks.com>
phone: 250.412.3272  |  fax: 250.475.6014

Suite 500 - 655 Tyee Rd. Victoria, BC. Canada V9A 6X5

www.abebooks.com  |  www.abebooks.co.uk  |  www.abebooks.de
www.abebooks.fr  |  www.abebooks.it  |  www.iberlibro.com



Mime
View raw message