hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furcy Pin <>
Subject Re: [Questio]Which record does Hive give the bigger number when I use row_number
Date Wed, 11 Oct 2017 15:25:07 GMT

Either one can receive the bigger row_num, in an underteministic fashion
(which is NOT equivalent to random).
Simply put, it will be whichever is treated last by Hive, which you have no
way to know.

If your two rows differ on other columns, you might want to add them to
your ORDER BY clause to ensure consistency.
If you do want to have them randomly shuffled, you can simply use "ORDER BY
cost, rand()"

Finally, there are other variants to row_number that behave slightly
differently, check out this link:

On Wed, Oct 11, 2017 at 4:33 PM, 孙志禹 <> wrote:

> Dear all,
>    Thanks since it's the first time for me to have a honor to ask
> questions here.
>     I used the hql script below:
>     -- ---------------------
>             select
>                 user_id
>                 , cost_date  -- datetime
>                 , cost  -- int
>                 , row_number over( partition by user_id order by cost  )
> as row_num
>             from table_A
>     -- ---------------------.
>     * The question is,* if for a special *user_id*( e.g. *user_id *=
> '11111'),  there are two records with the same *cost *in the table, and I
> know by using the function *row_number *Hive will give  different
> *row_nums *for both records, so which one will get the bigger *row_num*?
>     Thanks! And it's also okay to me if you give me a web-link which can
> give the answer.
> ----
>     Anci Sun from China

View raw message