Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EF9732009C6 for ; Tue, 31 May 2016 16:44:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EE16C160A46; Tue, 31 May 2016 14:44:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 41A70160A44 for ; Tue, 31 May 2016 16:44:17 +0200 (CEST) Received: (qmail 69144 invoked by uid 500); 31 May 2016 14:44:16 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 69135 invoked by uid 99); 31 May 2016 14:44:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2016 14:44:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 1FA371A11ED for ; Tue, 31 May 2016 14:44:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id UDevtzmPaWau for ; Tue, 31 May 2016 14:44:14 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id 095785FBA1 for ; Tue, 31 May 2016 14:44:13 +0000 (UTC) Received: (qmail 68463 invoked by uid 99); 31 May 2016 14:44:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 May 2016 14:44:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E1BC22C1F5D for ; Tue, 31 May 2016 14:44:12 +0000 (UTC) Date: Tue, 31 May 2016 14:44:12 +0000 (UTC) From: "Aljoscha Krettek (JIRA)" To: commits@beam.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-315) GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 31 May 2016 14:44:18 -0000 [ https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307827#comment-15307827 ] Aljoscha Krettek commented on BEAM-315: --------------------------------------- I attached a version that uses a {{String}} as key. With this, the results are also wrong but "less wrong" than with the {{Key}} class. I think the problem with having {{Key}} as a key is that {{AvroCoder.consistentWithEquals()}} is {{false}} and the Flink runner uses the serialized bytes to do comparisons. Not sure how the Dataflow runner deals with this, though. Also, once data is sufficiently large for the bug to appear the pipeline can not be executed on the {{DirectPipelineRunner}} or the {{InProcessPipelineRunner}} because both fail with a OOM exception. > GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner > ------------------------------------------------------------------------ > > Key: BEAM-315 > URL: https://issues.apache.org/jira/browse/BEAM-315 > Project: Beam > Issue Type: Bug > Components: runner-flink > Affects Versions: 0.1.0-incubating > Reporter: Pawel Szczur > Attachments: CoGroupPipelineStringKey.java > > > Same keys are processed multiple times. > A repo to reproduce the bug: > https://github.com/orian/cogroup-wrong-grouping > Discussion: > http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E > Notice: I haven't tested other runners (didn't manage to configure Spark). -- This message was sent by Atlassian JIRA (v6.3.4#6332)