flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Márton Balassi <balassi.mar...@gmail.com>
Subject Re: streaming GroupBy + Fold
Date Mon, 05 Oct 2015 21:48:59 GMT
Martin, I have looked at your code and you are running a fold in a window,
that is a very important distinction - the code paths are separate.
Those code paths have been recently touched by Aljoscha if I am not
mistaken.

I have mocked up a simple example and could not reproduce your problem
unfortunately. [1] Could you maybe produce a minimalistic example that we
can actually execute? :)

[1]
https://github.com/mbalassi/flink/commit/9f1f02d05e2bc2043a8f514d39fbf7753ea7058d

On Mon, Oct 5, 2015 at 10:06 PM, Márton Balassi <balassi.marton@gmail.com>
wrote:

> Thanks, I am checking it out tomorrow morning.
>
> On Mon, Oct 5, 2015 at 9:59 PM, Martin Neumann <mneumann@sics.se> wrote:
>
>> Hej,
>>
>> Sorry it took so long to respond I needed to check if I was actually
>> allowed to share the code since it uses internal datasets.
>>
>> In the appendix of this email you will find the main class of this job
>> without the supporting classes or the actual dataset. If you want to run it
>> you need to replace the dataset by something else but that should be
>> trivial.
>> If you just want to see the problem itself, have a look at the appended
>> log in conjunction with the code. Each ERROR printout in the log relates to
>> an accumulator receiving wrong values.
>>
>> cheers Martin
>>
>> On Sat, Oct 3, 2015 at 11:29 AM, Márton Balassi <balassi.marton@gmail.com
>> > wrote:
>>
>>> Hey,
>>>
>>> Thanks for reporting the problem, Martin. I have not merged the PR
>>> Stephan
>>> is referring to yet. [1] There I am cleaning up some of the internals
>>> too.
>>> Just out of curiosity, could you share the code for the failing test
>>> please?
>>>
>>> [1] https://github.com/apache/flink/pull/1155
>>>
>>> On Fri, Oct 2, 2015 at 8:26 PM, Martin Neumann <mneumann@sics.se> wrote:
>>>
>>> > One of my colleagues found it today when we where hunting bugs today.
>>> We
>>> > where using the latest 0.10 version pulled from maven this morning.
>>> > The program we where testing is new code so I cant tell you if the
>>> behavior
>>> > has changed or if it was always like this.
>>> >
>>> > On Fri, Oct 2, 2015 at 7:46 PM, Stephan Ewen <sewen@apache.org> wrote:
>>> >
>>> > > I think these operations were recently moved to the internal state
>>> > > interface. Did the behavior change then?
>>> > >
>>> > > @Marton or Gyula, can you comment? Is it per chance not mapped to the
>>> > > partitioned state?
>>> > >
>>> > > On Fri, Oct 2, 2015 at 6:37 PM, Martin Neumann <mneumann@sics.se>
>>> wrote:
>>> > >
>>> > > > Hej,
>>> > > >
>>> > > > In one of my Programs I run a Fold on a GroupedDataStream. The
aim
>>> is
>>> > to
>>> > > > aggregate the values in each group.
>>> > > > It seems the aggregator in the Fold function is shared on operator
>>> > level,
>>> > > > so all groups that end up on the same operator get mashed together.
>>> > > >
>>> > > > Is this the wanted behavior? If so, what do I have to do to
>>> separate
>>> > > them?
>>> > > >
>>> > > >
>>> > > > cheers Martin
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message