flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: Periodic full stream aggregations
Date Tue, 21 Apr 2015 13:18:43 GMT
You are right, but you should never try to compute full stream median,
thats the point :D

On Tue, Apr 21, 2015 at 2:52 PM, Bruno Cadonna <
cadonna@informatik.hu-berlin.de> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Gyula,
>
> I read your comments of your PR.
>
> I have a question to this comment:
>
> "It only allows aggregations so we dont need to keep the full history
> in a buffer."
>
> What if the user implements an aggregation function like a median?
>
> For a median you need the full history, don't you?
>
> Am I missing something?
>
> Cheers,
> Bruno
>
> On 21.04.2015 14:31, Gyula Fóra wrote:
> > I have opened a PR for this feature:
> >
> > https://github.com/apache/flink/pull/614
> >
> > Cheers, Gyula
> >
> > On Tue, Apr 21, 2015 at 1:10 PM, Gyula Fóra <gyula.fora@gmail.com>
> > wrote:
> >
> >> Thats a good idea, I will modify my PR to that :)
> >>
> >> Gyula
> >>
> >> On Tue, Apr 21, 2015 at 12:09 PM, Fabian Hueske
> >> <fhueske@gmail.com> wrote:
> >>
> >>> Is it possible to switch the order of the statements, i.e.,
> >>>
> >>> dataStream.every(Time.of(4,sec)).reduce(...) instead of
> >>> dataStream.reduce(...).every(Time.of(4,sec))
> >>>
> >>> I think that would be more consistent with the structure of the
> >>> remaining API.
> >>>
> >>> Cheers, Fabian
> >>>
> >>> 2015-04-21 10:57 GMT+02:00 Gyula Fóra <gyfora@apache.org>:
> >>>
> >>>> Hi Bruno,
> >>>>
> >>>> Of course you can do that as well. (That's the good part :p
> >>>> )
> >>>>
> >>>> I will open a PR soon with the proposed changes (first
> >>>> without breaking
> >>> the
> >>>> current Api) and I will post it here.
> >>>>
> >>>> Cheers, Gyula
> >>>>
> >>>> On Tuesday, April 21, 2015, Bruno Cadonna <
> >>> cadonna@informatik.hu-berlin.de
> >>>>>
> >>>> wrote:
> >>>>
> > Hi Gyula,
> >
> > I have a question regarding your suggestion.
> >
> > Can the current continuous aggregation be also specified with your
> > proposed periodic aggregation?
> >
> > I am thinking about something like
> >
> > dataStream.reduce(...).every(Count.of(1))
> >
> > Cheers, Bruno
> >
> > On 20.04.2015 22:32, Gyula Fóra wrote:
> >>>>>>> Hey all,
> >>>>>>>
> >>>>>>> I think we are missing a quite useful feature that
> >>>>>>> could be implemented (with some slight modifications)
> >>>>>>> on top of the current windowing api.
> >>>>>>>
> >>>>>>> We currently provide 2 ways of aggregating (or
> >>>>>>> reducing) over streams: doing a continuous aggregation
> >>>>>>> and always output the aggregated value (which cannot be
> >>>>>>> done properly in parallel) or doing aggregation in a
> >>>>>>> window periodically.
> >>>>>>>
> >>>>>>> What we don't have at the moment is periodic
> >>>>>>> aggregations on the whole stream. I would even go as
> >>>>>>> far as to remove the continuous outputting
> >>>>>>> reduce/aggregate it and replace it with this version
> >>>>>>> as this in return can be done properly in parallel.
> >>>>>>>
> >>>>>>> My suggestion would be that a call:
> >>>>>>>
> >>>>>>> dataStream.reduce(..) dataStream.sum(..)
> >>>>>>>
> >>>>>>> would return a windowed data stream where the window is
> >>>>>>> the whole record history, and the user would need to
> >>>>>>> define a trigger to get the actual reduced values
> >>>>>>> like:
> >>>>>>>
> >>>>>>> dataStream.reduce(...).every(Time.of(4,sec)) to get the
> >>>>>>> actual reduced results. dataStream.sum(...).every(...)
> >>>>>>>
> >>>>>>> I think the current data stream reduce/aggregation is
> >>>>>>> very confusing without being practical for any normal
> >>>>>>> use-case.
> >>>>>>>
> >>>>>>> Also this would be a very api breaking change (but I
> >>>>>>> would still make this change as it is much more
> >>>>>>> intuitive than the current behaviour) so I would try to
> >>>>>>> push it before the release if we can agree.
> >>>>>>>
> >>>>>>> Cheers, Gyula
> >>>>>>>
> >
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
> - --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>   Dr. Bruno Cadonna
>   Postdoctoral Researcher
>
>   Databases and Information Systems
>   Department of Computer Science
>   Humboldt-Universität zu Berlin
>
>   http://www.informatik.hu-berlin.de/~cadonnab
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iQEcBAEBAgAGBQJVNkgYAAoJEKdCIJx7flKw7rUIAMmu80ZuMvfA/BvQemkEo7As
> bU3iWre+e3OUWNRLuf2JfG9CHMKFSjBJG6Jax/pWZBXTYh8oaYDrYixq7e+vljqf
> P9ypurhd1h8In71aSUyUPIsrTg6aJ5xo/beUxA6LFbB2LpVqawNDe0gjn3ZRMobM
> zmn962kqp0oHAVipYI2mzEU6RNl1Kh0PoaLaZRLRh+dlgKofqDFcBiB3hhG/VEoF
> sCsCAsC1bXtpToPRZ29cRcEfpHcnE3zCgivPeG83JsWYr4mIEj7gp+smFUz0PjoI
> 1wHv/pnZJS4Onk38HH1GcP95/uYpqm4gz3OBCuE7v+3b1bI852bIvnUZrCGLOew=
> =u1R0
> -----END PGP SIGNATURE-----
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message