hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Lee <>
Subject Re: Hi, Hive People urgent question about [Distribute By] function
Date Thu, 22 Oct 2015 16:42:44 GMT
Thanks for your help.

so do you think if we want the same result from Hive and Spark or the other
freamwork, how could we try this one ?
could you tell me in detail.


On Thu, Oct 22, 2015 at 6:25 PM, Gopal Vijayaraghavan <
> wrote:

> > When applying [Distribute By] on Hive to the framework, the function
> >should be partitionByHash on Flink. This is to spread out all the rows
> >distributed by a hash key from Object Class in Java.
> Hive does not use the Object hashCode - the identityHashCode is
> inconsistent, so Object.hashCode() .
> ObjectInspectorUtils::hashCode() is the hashcode used by the DBY in hive
> (SORT BY uses a Random number generator).
> Cheers,
> Gopal



*Hae Joon Lee*

Now, in Germany,

M.S. Candidate, Interested in Distributed System, Iterative Processing

Dept. of Computer Science, Informatik in German, TUB

Technical University of Berlin

In Korea,

M.S. Candidate, Computer Architecture Laboratory

Dept. of Computer Science, KAIST

Rm# 4414 CS Dept. KAIST

373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701)

Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea


View raw message