Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 79173200B54 for ; Thu, 14 Jul 2016 03:16:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 744B8160A7C; Thu, 14 Jul 2016 01:16:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BEEE8160A6E for ; Thu, 14 Jul 2016 03:16:21 +0200 (CEST) Received: (qmail 50044 invoked by uid 500); 14 Jul 2016 01:16:20 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 50028 invoked by uid 99); 14 Jul 2016 01:16:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Jul 2016 01:16:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 9B4542C02A6 for ; Thu, 14 Jul 2016 01:16:20 +0000 (UTC) Date: Thu, 14 Jul 2016 01:16:20 +0000 (UTC) From: "Greg Hogan (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (FLINK-3477) Add hash-based combine strategy for ReduceFunction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 14 Jul 2016 01:16:22 -0000 [ https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-3477. ----------------------------- Resolution: Implemented Implemented in 52e191a5067322e82192314c16e70ae9e937ae2c > Add hash-based combine strategy for ReduceFunction > -------------------------------------------------- > > Key: FLINK-3477 > URL: https://issues.apache.org/jira/browse/FLINK-3477 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime > Reporter: Fabian Hueske > Assignee: Gabor Gevay > > This issue is about adding a hash-based combine strategy for ReduceFunctions. > The interface of the {{reduce()}} method is as follows: > {code} > public T reduce(T v1, T v2) > {code} > Input type and output type are identical and the function returns only a single value. A Reduce function is incrementally applied to compute a final aggregated value. This allows to hold the preaggregated value in a hash-table and update it with each function call. > The hash-based strategy requires special implementation of an in-memory hash table. The hash table should support in place updates of elements (if the updated value has the same size as the new value) but also appending updates with invalidation of the old value (if the binary length of the new value differs). The hash table needs to be able to evict and emit all elements if it runs out-of-memory. > We should also add {{HASH}} and {{SORT}} compiler hints to {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the execution strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)