beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aljoscha Krettek (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-315) GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner
Date Tue, 31 May 2016 14:44:12 GMT

    [ https://issues.apache.org/jira/browse/BEAM-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307827#comment-15307827
] 

Aljoscha Krettek commented on BEAM-315:
---------------------------------------

I attached a version that uses a {{String}} as key. With this, the results are also wrong
but "less wrong" than with the {{Key}} class. I think the problem with having {{Key}} as a
key is that {{AvroCoder.consistentWithEquals()}} is {{false}} and the Flink runner uses the
serialized bytes to do comparisons. Not sure how the Dataflow runner deals with this, though.
Also, once data is sufficiently large for the bug to appear the pipeline can not be executed
on the {{DirectPipelineRunner}} or the {{InProcessPipelineRunner}} because both fail with
a OOM exception.

> GroupByKey/CoGroupByKey doesn't group correctly with FlinkPipelineRunner
> ------------------------------------------------------------------------
>
>                 Key: BEAM-315
>                 URL: https://issues.apache.org/jira/browse/BEAM-315
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 0.1.0-incubating
>            Reporter: Pawel Szczur
>         Attachments: CoGroupPipelineStringKey.java
>
>
> Same keys are processed multiple times.
> A repo to reproduce the bug:
> https://github.com/orian/cogroup-wrong-grouping
> Discussion:
> http://mail-archives.apache.org/mod_mbox/incubator-beam-user/201605.mbox/%3CCAB2uKkG2xHsWpLFUkYnt8eEzdxU%3DB_nu6crTwVi-ZuUpugxkPQ%40mail.gmail.com%3E
> Notice: I haven't tested other runners (didn't manage to configure Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message