hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <>
Subject Re: difference between partition by and distribute by in rank()
Date Fri, 11 Jul 2014 08:32:02 GMT
In general principle,
distribute by  ensures each of N reducers gets non-overlapping ranges of X ,
but doesn't sort the output of each reducer. You end up with N or unsorted
files with non-overlapping ranges. So this is more of a horizontal
distribution of data.

In my view,
Partition by is more based on values so its vertical distribution of data.

I may be wrong in understanding this

On Fri, Jul 11, 2014 at 1:38 PM, Eric Chu <> wrote:

> Does anyone know what
> *rank() over(distribute by p_mfgr sort by p_name) *
> does exactly and how it's different from
> *rank() over(partition by p_mfgr order by p_name)*?
> Thanks,
> Eric

Nitin Pawar

View raw message