flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: streaming GroupBy + Fold
Date Tue, 06 Oct 2015 06:09:01 GMT
Hi,
If you are using a fold you are using none of the new code paths. I will
add support for Fold to the new windowing implementation today, though.

Cheers,
Aljoscha

On Mon, 5 Oct 2015 at 23:49 Márton Balassi <balassi.marton@gmail.com> wrote:

> Martin, I have looked at your code and you are running a fold in a window,
> that is a very important distinction - the code paths are separate.
> Those code paths have been recently touched by Aljoscha if I am not
> mistaken.
>
> I have mocked up a simple example and could not reproduce your problem
> unfortunately. [1] Could you maybe produce a minimalistic example that we
> can actually execute? :)
>
> [1]
>
> https://github.com/mbalassi/flink/commit/9f1f02d05e2bc2043a8f514d39fbf7753ea7058d
>
> On Mon, Oct 5, 2015 at 10:06 PM, Márton Balassi <balassi.marton@gmail.com>
> wrote:
>
> > Thanks, I am checking it out tomorrow morning.
> >
> > On Mon, Oct 5, 2015 at 9:59 PM, Martin Neumann <mneumann@sics.se> wrote:
> >
> >> Hej,
> >>
> >> Sorry it took so long to respond I needed to check if I was actually
> >> allowed to share the code since it uses internal datasets.
> >>
> >> In the appendix of this email you will find the main class of this job
> >> without the supporting classes or the actual dataset. If you want to
> run it
> >> you need to replace the dataset by something else but that should be
> >> trivial.
> >> If you just want to see the problem itself, have a look at the appended
> >> log in conjunction with the code. Each ERROR printout in the log
> relates to
> >> an accumulator receiving wrong values.
> >>
> >> cheers Martin
> >>
> >> On Sat, Oct 3, 2015 at 11:29 AM, Márton Balassi <
> balassi.marton@gmail.com
> >> > wrote:
> >>
> >>> Hey,
> >>>
> >>> Thanks for reporting the problem, Martin. I have not merged the PR
> >>> Stephan
> >>> is referring to yet. [1] There I am cleaning up some of the internals
> >>> too.
> >>> Just out of curiosity, could you share the code for the failing test
> >>> please?
> >>>
> >>> [1] https://github.com/apache/flink/pull/1155
> >>>
> >>> On Fri, Oct 2, 2015 at 8:26 PM, Martin Neumann <mneumann@sics.se>
> wrote:
> >>>
> >>> > One of my colleagues found it today when we where hunting bugs today.
> >>> We
> >>> > where using the latest 0.10 version pulled from maven this morning.
> >>> > The program we where testing is new code so I cant tell you if the
> >>> behavior
> >>> > has changed or if it was always like this.
> >>> >
> >>> > On Fri, Oct 2, 2015 at 7:46 PM, Stephan Ewen <sewen@apache.org>
> wrote:
> >>> >
> >>> > > I think these operations were recently moved to the internal state
> >>> > > interface. Did the behavior change then?
> >>> > >
> >>> > > @Marton or Gyula, can you comment? Is it per chance not mapped
to
> the
> >>> > > partitioned state?
> >>> > >
> >>> > > On Fri, Oct 2, 2015 at 6:37 PM, Martin Neumann <mneumann@sics.se>
> >>> wrote:
> >>> > >
> >>> > > > Hej,
> >>> > > >
> >>> > > > In one of my Programs I run a Fold on a GroupedDataStream.
The
> aim
> >>> is
> >>> > to
> >>> > > > aggregate the values in each group.
> >>> > > > It seems the aggregator in the Fold function is shared on
> operator
> >>> > level,
> >>> > > > so all groups that end up on the same operator get mashed
> together.
> >>> > > >
> >>> > > > Is this the wanted behavior? If so, what do I have to do
to
> >>> separate
> >>> > > them?
> >>> > > >
> >>> > > >
> >>> > > > cheers Martin
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message