hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From comptech geeky <comptechge...@gmail.com>
Subject What will be the Rank UDF for this particular scenario?
Date Fri, 20 Jul 2012 04:07:21 GMT
Below is the table data, I have provided in descending order of time-



*BUYER_ID             ITEM_ID                             CREATED_TIME*

1345653                 330760137950                    2012-07-09 21:41:29

1345653                 330760137950                    2012-07-09 21:40:29




1345653                 110909316904                    2012-07-09 21:30:06

1345653                 110909316904                    2012-07-09 21:29:06




1345653                 221065796761                    2012-07-09 19:32:48

1345653                 221065796761                    2012-07-09 19:31:48




1345653                 300729306444                    2012-07-09 19:02:35

1345653                 300729306444                    2012-07-09 19:01:35




1345653                 150851771618                    2012-07-09 18:58:33



1345653                130724723989                    2012-07-09 16:00:44



*I need TOP 5 data from above table.*

*
*

So By TOP 5, I mean

If you see above Table, first two rows should be counted as 1 because
BUYER_ID and ITEM_ID are same in first two rows.

Then same with third row and fourth row because again BUYER_ID and ITEM_ID
are same, so they will be counted as 2.

And same with fifth and sixth row, they will be counted as 3 due to same
reason.

And same with seventh and eigth row, they will be counted as 4 due to same
reason.

And ninth row, they will be counted as 5.


So I need Output something like this below if I need TOP 5-



1345653                 330760137950                    2012-07-09 21:41:29

1345653                 330760137950                    2012-07-09 21:40:29




1345653                 110909316904                    2012-07-09 21:30:06

1345653                 110909316904                    2012-07-09 21:29:06




1345653                 221065796761                    2012-07-09 19:32:48

1345653                 221065796761                    2012-07-09 19:31:48




1345653                 300729306444                    2012-07-09 19:02:35

1345653                 300729306444                    2012-07-09 19:01:35




1345653                 150851771618                    2012-07-09 18:58:33


So problem statement is like this-

If BUYER_ID and ITEM_ID is same *two times(should be configurable)*, then
they will be counted as 1.


So for this particular problem, what will be Rank UDF? Currently I have
Rank UDF as-


*public final class RankNew extends UDF{*

*    private int  counter;*

*    private String last_key1;*

*    private String last_key2;*

*    public int evaluate(final String key1, final String key2){*

*  if ( !key1.equalsIgnoreCase(this.last_key1) &&
!key2.equalsIgnoreCase(this.last_key2) ) {*

*     this.counter = 0;*

*     this.last_key1 = key1;*

*     this.last_key2 = key2;*

*  }*

*  return this.counter++;*

*    }*

*}*

Mime
View raw message