flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabor Gevay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2237) Add hash-based Aggregation
Date Sat, 26 Sep 2015 09:29:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909174#comment-14909174
] 

Gabor Gevay commented on FLINK-2237:
------------------------------------

Couldn't the CompactingHashTable be reused here? The driver would get a prober from it, and
then for each incoming record, the driver would call getMatchFor, then do one step of the
reduce, and then write the result back with updateMatch.

I'm interested in this feature because I have a computation where it would really make a huge
difference: I'm calling flatMap on an already large data set, which blows it up to about 10
times the size, and then groupBy and reduce. If Flink had hash-based aggregation, then the
result of the flatMap wouldn't need to be materialized, which would reduce the memory requirement
to about 1/10.

> Add hash-based Aggregation
> --------------------------
>
>                 Key: FLINK-2237
>                 URL: https://issues.apache.org/jira/browse/FLINK-2237
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Rafiullah Momand
>            Priority: Minor
>
> Aggregation functions at the moment are implemented in a sort-based way.
> How can we implement hash based Aggregation for Flink?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message