flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Lee <philjj...@gmail.com>
Subject Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API
Date Sun, 18 Oct 2015 19:16:20 GMT
Hi, Flink people, a question about translation from HIVE Query to Flink
fucntioin by using Table API. In sum up, I am working on some benchmark for

I am Philip Lee majoring in Computer Science in Master Degree of TUB. , I
work on translation from Hive Query of Benchmark to Flink codes.

As I stuided it, I have a few of questions.

*First of all*, if there are people who do no know *Hive functions*, let me
briefly explan.

   - *ORDER BY*: it just guarntees total order in the output.
   - *SORT BY*: it only guarntess ordering of the rows within a reducer.
   - *GROUP BY*: this is just groupBy function in SQL.
   - *DISTRIBUTE BY*: all rows with the same distributed by columns will go
   to the same reducer.
   - *CLUSTER BY*: this is just consisted of Distribute By the same column
   + Sort By the same column.

I just want to check that the flink functions I use are equal to Hive one.
< Hive SQL Query = Flink functions >

   - *ORDER BY* = sortPartition(,)
   - *SORT BY*= groupBy(`col).sortPartition(,)
   - *GROUP BY*: this is just groupBy function.
   - *DISTRIBUTE BY* = groupBy(`col)
   - *CLUSTER BY *= groupBy(`col).sortPartition(,)

I do not see much difference between groupBy and distributed by if I apply
it to flink function.
If this is hadoop version, we could say mapper is distribute by on hadoop.
However, I am not much sure what could be DISTRIBUTE BY on flink. I tried
to guess groupBy on Flink could be the function which is to distribute the
rows by the specified key.

Please feel free to correct what I suggested.

*Secondly*, I just want to make sure the difference between reduce function
and reduceGroup. I guess there must be a trade-off between two functinos. I
know reduceGroup is invoked with an Iterator, but which case is more proper
and benifical to use reduceGroup function rather than reduce function?

Best Regards,



*Hae Joon Lee*

Now, in Germany,

M.S. Candidate, Interested in Distributed System, Iterative Processing

Dept. of Computer Science, Informatik in German, TUB

Technical University of Berlin

In Korea,

M.S. Candidate, Computer Architecture Laboratory

Dept. of Computer Science, KAIST

Rm# 4414 CS Dept. KAIST

373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701)

Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea


View raw message