flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Thoughts About Streaming
Date Thu, 25 Jun 2015 16:11:44 GMT
+1 go ahead

On Thu, 25 Jun 2015 at 18:02 Stephan Ewen <sewen@apache.org> wrote:

> Hey!
>
> This thread covers many different topics. Lets break this up into separate
> discussions.
>
>  - Operator State is already driven by Gyula and Paris and happening on the
> above mentioned pull request and the followup discussions.
>
>  - For windowing, this discussion has brought some results that we should
> sum up and clearly write down.
>    I would like to chime in to do that based on what I learned from the
> document and this discussion. I also got some input from Marton about what
> he learned from mapping the Cloud DataFlow constructs to Flink.
>    I'll draft a Wiki page (with the help of Aljoscha, Marton) that sums
> this up and documents it for users (if we decide to adopt this).
>    Then we run this by Gyula, Matthias Sax and Kostas for feedback.
>
>  - API style discussions should be yet another thread. This will probably
> be opened as people start to address that.
>
>
> I'll try to get a draft of the wiki version out tomorrow noon and send the
> link around.
>
> Greetings,
> Stephan
>
>
>
> On Thu, Jun 25, 2015 at 3:51 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
>
> > Sure. I picked this up. Using the current model for "occurrence time
> > semantics" does not work.
> >
> > I elaborated on this in the past many times (but nobody cared). It is
> > important to make it clear to the user what semantics are supported.
> > Claiming to support "sliding windows" doesn't mean anything; there are
> > too many different semantics out there. :)
> >
> >
> > On 06/25/2015 03:35 PM, Aljoscha Krettek wrote:
> > > Yes, I am aware of this requirement and it would also be supported in
> my
> > > proposed model.
> > >
> > > The problem is, that the "custom timestamp" feature gives the
> impression
> > > that the elements would be windowed according to a user-timestamp. The
> > > results, however, are wrong because of the assumption about elements
> > > arriving in order. (This is what I was trying to show with my fancy
> ASCII
> > > art and result output.
> > >
> > > On Thu, 25 Jun 2015 at 15:26 Matthias J. Sax <
> > mjsax@informatik.hu-berlin.de>
> > > wrote:
> > >
> > >> Hi Aljoscha,
> > >>
> > >> I like that you are pushing in this direction. However, IMHO you
> > >> misinterpreter the current approach. It does not assume that tuples
> > >> arrive in-order; the current approach has no notion about a
> > >> pre-defined-order (for example, the order in which the event are
> > >> created). There is only the notion of "arrival-order" at the operator.
> > >> From this "arrival-order" perspective, the result are correct(!).
> > >>
> > >> Windowing in the current approach means for example, "sum up an
> > >> attribute of all events you *received* in the last 5 seconds". That
> is a
> > >> different meaning that "sum up an attribute of all event that
> *occurred*
> > >> in the last 5 seconds". Both queries are valid and Flink should
> support
> > >> both IMHO.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >>
> > >> On 06/25/2015 03:03 PM, Aljoscha Krettek wrote:
> > >>> Yes, now this also processes about 3 mio Elements (Window Size 5 sec,
> > >> Slide
> > >>> 1 sec) but it still fluctuates a lot between 1 mio. and 5 mio.
> > >>>
> > >>> Performance is not my main concern, however. My concern is that the
> > >> current
> > >>> model assumes elements to arrive in order, which is simply not true.
> > >>>
> > >>> In your code you have these lines for specifying the window:
> > >>> .window(Time.of(1l, TimeUnit.SECONDS))
> > >>> .every(Time.of(1l, TimeUnit.SECONDS))
> > >>>
> > >>> Although this semantically specifies a tumbling window of size 1 sec
> > I'm
> > >>> afraid it uses the sliding window logic internally (because of the
> > >>> .every()).
> > >>>
> > >>> In my tests I only have the first line.
> > >>>
> > >>>
> > >>> On Thu, 25 Jun 2015 at 14:32 Gábor Gévay <ggab90@gmail.com>
wrote:
> > >>>
> > >>>> I'm very sorry, I had a bug in the InversePreReducer. It should
be
> > >>>> fixed now. Can you please run it again?
> > >>>>
> > >>>> I also tried to reproduce some of your performance numbers, but
I'm
> > >>>> getting only less than 1/10th of yours. For example, in the Tumbling
> > >>>> case, Current/Reduce produces only ~100000 for me. Do you have
any
> > >>>> idea what I could be doing wrong? My code:
> > >>>> http://pastebin.com/zbEjmGhk
> > >>>> I am running it on a 2 GHz Core i7.
> > >>>>
> > >>>> Best regards,
> > >>>> Gabor
> > >>>>
> > >>>>
> > >>>> 2015-06-25 12:31 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
> > >>>>> Hi,
> > >>>>> I also ran the tests on top of PR 856 (inverse reducer) now.
The
> > >> results
> > >>>>> seem incorrect. When I insert a Thread.sleep(1) in the tuple
> source,
> > >> all
> > >>>>> the previous tests reported around 3600 tuples (Size 5 sec,
Slide 1
> > >> sec)
> > >>>>> (Theoretically there would be 5000 tuples in 5 seconds but
this is
> > due
> > >> to
> > >>>>> overhead). These are the results for the inverse reduce
> optimisation:
> > >>>>> (Tuple 0,38)
> > >>>>> (Tuple 0,829)
> > >>>>> (Tuple 0,1625)
> > >>>>> (Tuple 0,2424)
> > >>>>> (Tuple 0,3190)
> > >>>>> (Tuple 0,3198)
> > >>>>> (Tuple 0,-339368)
> > >>>>> (Tuple 0,-1315725)
> > >>>>> (Tuple 0,-2932932)
> > >>>>> (Tuple 0,-5082735)
> > >>>>> (Tuple 0,-7743256)
> > >>>>> (Tuple 0,75701046)
> > >>>>> (Tuple 0,642829470)
> > >>>>> (Tuple 0,2242018381)
> > >>>>> (Tuple 0,5190708618)
> > >>>>> (Tuple 0,10060360311)
> > >>>>> (Tuple 0,-94254951)
> > >>>>> (Tuple 0,-219806321293)
> > >>>>> (Tuple 0,-1258895232699)
> > >>>>> (Tuple 0,-4074432596329)
> > >>>>>
> > >>>>> One line is one emitted window count. This is what happens
when I
> > >> remove
> > >>>>> the Thread.sleep(1):
> > >>>>> (Tuple 0,660676)
> > >>>>> (Tuple 0,2553733)
> > >>>>> (Tuple 0,3542696)
> > >>>>> (Tuple 0,1)
> > >>>>> (Tuple 0,1107035)
> > >>>>> (Tuple 0,2549491)
> > >>>>> (Tuple 0,4100387)
> > >>>>> (Tuple 0,-8406583360092)
> > >>>>> (Tuple 0,-8406582150743)
> > >>>>> (Tuple 0,-8406580427190)
> > >>>>> (Tuple 0,-8406580427190)
> > >>>>> (Tuple 0,-8406580427190)
> > >>>>> (Tuple 0,6847279255682044995)
> > >>>>> (Tuple 0,6847279255682044995)
> > >>>>> (Tuple 0,-5390528042713628318)
> > >>>>> (Tuple 0,-5390528042711551780)
> > >>>>> (Tuple 0,-5390528042711551780)
> > >>>>>
> > >>>>> So at some point the pre-reducer seems to go haywire and does
not
> > >> recover
> > >>>>> from it. The good thing is that it does produce results now,
where
> > the
> > >>>>> previous Current/Reduce would simply hang and not produce any
> output.
> > >>>>>
> > >>>>> On Thu, 25 Jun 2015 at 12:02 Gábor Gévay <ggab90@gmail.com>
wrote:
> > >>>>>
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> Aljoscha, can you please try the performance test of
> Current/Reduce
> > >>>>>> with the InversePreReducer in PR 856? (If you just call
sum, it
> will
> > >>>>>> use an InversePreReducer.) It would be an interesting test,
> because
> > >>>>>> the inverse function optimization really depends on the
stream
> being
> > >>>>>> ordered, and I think it has the potential of being faster
then
> > >>>>>> Next/Reduce. Especially if the window size is much larger
than the
> > >>>>>> slide size.
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>> Gabor
> > >>>>>>
> > >>>>>>
> > >>>>>> 2015-06-25 11:36 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org
> >:
> > >>>>>>> I think I'll have to elaborate a bit so I created a
> > proof-of-concept
> > >>>>>>> implementation of my Ideas and ran some throughput
measurements
> to
> > >>>>>>> alleviate concerns about performance.
> > >>>>>>>
> > >>>>>>> First, though, I want to highlight again why the current
approach
> > >> does
> > >>>>>> not
> > >>>>>>> work with out-of-order elements (which, again, occur
constantly
> due
> > >> to
> > >>>>>> the
> > >>>>>>> distributed nature of the system). This is the example
I posted
> > >>>> earlier:
> > >>>>>>> https://gist.github.com/aljoscha/a367012646ab98208d27.
The plan
> > >> looks
> > >>>>>> like
> > >>>>>>> this:
> > >>>>>>>
> > >>>>>>> +--+
> > >>>>>>> | | Source
> > >>>>>>> +--+
> > >>>>>>> |
> > >>>>>>> +-----+
> > >>>>>>> | |
> > >>>>>>> | +--+
> > >>>>>>> | | | Identity Map
> > >>>>>>> | +--+
> > >>>>>>> | |
> > >>>>>>> +-----+
> > >>>>>>> |
> > >>>>>>> +--+
> > >>>>>>> | | Window
> > >>>>>>> +--+
> > >>>>>>> |
> > >>>>>>> |
> > >>>>>>> +--+
> > >>>>>>> | | Sink
> > >>>>>>> +--+
> > >>>>>>>
> > >>>>>>> So all it does is pass the elements through an identity
map and
> > then
> > >>>>>> merge
> > >>>>>>> them again before the window operator. The source emits
ascending
> > >>>>>> integers
> > >>>>>>> and the window operator has a custom timestamp extractor
that
> uses
> > >> the
> > >>>>>>> integer itself as the timestamp and should create windows
of
> size 4
> > >>>> (that
> > >>>>>>> is elements with timestamp 0-3 are one window, the
next are the
> > >>>> elements
> > >>>>>>> with timestamp 4-8, and so on). Since the topology
basically
> > doubles
> > >>>> the
> > >>>>>>> elements form the source I would expect to get these
windows:
> > >>>>>>> Window: 0, 0, 1, 1, 2, 2, 3, 3
> > >>>>>>> Window: 4, 4, 6, 6, 7, 7, 8, 8
> > >>>>>>>
> > >>>>>>> The output is this, however:
> > >>>>>>> Window: 0, 1, 2, 3,
> > >>>>>>> Window: 4, 0, 1, 2, 3, 4, 5, 6, 7,
> > >>>>>>> Window: 8, 9, 10, 11,
> > >>>>>>> Window: 12, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
> > >>>>>>> Window: 16, 17, 18, 19,
> > >>>>>>> Window: 20, 21, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23,
> > >>>>>>> Window: 24, 25, 26, 27,
> > >>>>>>> Window: 28, 29, 30, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31,
> > >>>>>>>
> > >>>>>>> The reason is that the elements simply arrive out-of-order.
> Imagine
> > >>>> what
> > >>>>>>> would happen if the elements actually arrived with
some delay
> from
> > >>>>>>> different operations.
> > >>>>>>>
> > >>>>>>> Now, on to the performance numbers. The proof-of-concept
I
> created
> > is
> > >>>>>>> available here:
> > >>>>>>> https://github.com/aljoscha/flink/tree/event-time-window-fn-mock
> .
> > >> The
> > >>>>>> basic
> > >>>>>>> idea is that sources assign the current timestamp when
emitting
> > >>>> elements.
> > >>>>>>> They also periodically emit watermarks that tell us
that no
> > elements
> > >>>> with
> > >>>>>>> an earlier timestamp will be emitted. The watermarks
propagate
> > >> through
> > >>>>>> the
> > >>>>>>> operators. The window operator looks at the timestamp
of an
> element
> > >>>> and
> > >>>>>>> puts it into the buffer that corresponds to that window.
When the
> > >>>> window
> > >>>>>>> operator receives a watermark it will look at the in-flight
> windows
> > >>>>>>> (basically the buffers) and emit those windows where
the
> window-end
> > >> is
> > >>>>>>> before the watermark.
> > >>>>>>>
> > >>>>>>> For measuring throughput I did the following: The source
emits
> > tuples
> > >>>> of
> > >>>>>>> the form ("tuple", 1) in an infinite loop. The window
operator
> sums
> > >> up
> > >>>>>> the
> > >>>>>>> tuples, thereby counting how many tuples the window
operator can
> > >>>> handle
> > >>>>>> in
> > >>>>>>> a given time window. There are two different implementations
for
> > the
> > >>>>>>> summation: 1) simply summing up the values in a mapWindow(),
> there
> > >> you
> > >>>>>> get
> > >>>>>>> a List of all tuples and simple iterate over it. 2)
using sum(1),
> > >>>> which
> > >>>>>> is
> > >>>>>>> implemented as a reduce() (that uses the pre-reducer
> > optimisations).
> > >>>>>>>
> > >>>>>>> These are the performance numbers (Current is the current
> > >>>> implementation,
> > >>>>>>> Next is my proof-of-concept):
> > >>>>>>>
> > >>>>>>> Tumbling (1 sec):
> > >>>>>>>  - Current/Map: 1.6 mio
> > >>>>>>>  - Current/Reduce: 2 mio
> > >>>>>>>  - Next/Map: 2.2 mio
> > >>>>>>>  - Next/Reduce: 4 mio
> > >>>>>>>
> > >>>>>>> Sliding (5 sec, slide 1 sec):
> > >>>>>>>  - Current/Map: ca 3 mio (fluctuates a lot)
> > >>>>>>>  - Current/Reduce: No output
> > >>>>>>>  - Next/Map: ca 4 mio (fluctuates)
> > >>>>>>>  - Next/Reduce: 10 mio
> > >>>>>>>
> > >>>>>>> The Next/Reduce variant can basically scale indefinitely
with
> > window
> > >>>> size
> > >>>>>>> because the internal state does not rely on the number
of
> elements
> > >>>> (it is
> > >>>>>>> just the current sum). The pre-reducer for sliding
elements
> cannot
> > >>>> handle
> > >>>>>>> the amount of tuples, it produces no output. For the
two Map
> > variants
> > >>>> the
> > >>>>>>> performance fluctuates because they always keep all
the elements
> in
> > >> an
> > >>>>>>> internal buffer before emission, this seems to tax
the garbage
> > >>>> collector
> > >>>>>> a
> > >>>>>>> bit and leads to random pauses.
> > >>>>>>>
> > >>>>>>> One thing that should be noted is that I had to disable
the
> > >>>> fake-element
> > >>>>>>> emission thread, otherwise the Current versions would
deadlock.
> > >>>>>>>
> > >>>>>>> So, I started working on this because I thought that
out-of-order
> > >>>>>>> processing would be necessary for correctness. And
it is
> certainly,
> > >>>> But
> > >>>>>> the
> > >>>>>>> proof-of-concept also shows that performance can be
greatly
> > improved.
> > >>>>>>>
> > >>>>>>> On Wed, 24 Jun 2015 at 09:46 Gyula Fóra <gyula.fora@gmail.com>
> > >> wrote:
> > >>>>>>>>
> > >>>>>>>> I agree lets separate these topics from each other
so we can get
> > >>>> faster
> > >>>>>>>> resolution.
> > >>>>>>>>
> > >>>>>>>> There is already a state discussion in the thread
we started
> with
> > >>>> Paris.
> > >>>>>>>>
> > >>>>>>>> On Wed, Jun 24, 2015 at 9:24 AM Kostas Tzoumas
<
> > ktzoumas@apache.org
> > >>>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> I agree with supporting out-of-order out of
the box :-), even
> if
> > >>>> this
> > >>>>>>> means
> > >>>>>>>>> a major refactoring. This is the right time
to refactor the
> > >>>> streaming
> > >>>>>>> API
> > >>>>>>>>> before we pull it out of beta. I think that
this is more
> > important
> > >>>>>> than
> > >>>>>>> new
> > >>>>>>>>> features in the streaming API, which can be
prioritized once
> the
> > >>>> API
> > >>>>>> is
> > >>>>>>> out
> > >>>>>>>>> of beta (meaning, that IMO this is the right
time to stall PRs
> > >>>> until
> > >>>>>> we
> > >>>>>>>>> agree on the design).
> > >>>>>>>>>
> > >>>>>>>>> There are three sections in the document: windowing,
state, and
> > >>>> API.
> > >>>>>> How
> > >>>>>>>>> convoluted are those with each other? Can we
separate the
> > >>>> discussion
> > >>>>>> or
> > >>>>>>> do
> > >>>>>>>>> we need to discuss those all together? I think
part of the
> > >>>> difficulty
> > >>>>>> is
> > >>>>>>>>> that we are discussing three design choices
at once.
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Jun 23, 2015 at 7:43 PM, Ted Dunning
<
> > >>>> ted.dunning@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Out of order is ubiquitous in the real-world.
 Typically, what
> > >>>>>>> happens is
> > >>>>>>>>>> that businesses will declare a maximum
allowable delay for
> > >>>> delayed
> > >>>>>>>>>> transactions and will commit to results
when that delay is
> > >>>> reached.
> > >>>>>>>>>> Transactions that arrive later than this
cutoff are collected
> > >>>>>>> specially
> > >>>>>>>>> as
> > >>>>>>>>>> corrections which are reported/used when
possible.
> > >>>>>>>>>>
> > >>>>>>>>>> Clearly, ordering can also be violated
during processing, but
> if
> > >>>> the
> > >>>>>>> data
> > >>>>>>>>>> is originally out of order the situation
can't be repaired by
> > any
> > >>>>>>>>> protocol
> > >>>>>>>>>> fixes that prevent transactions from becoming
disordered but
> has
> > >>>> to
> > >>>>>>>>> handled
> > >>>>>>>>>> at the data level.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, Jun 23, 2015 at 9:29 AM, Aljoscha
Krettek <
> > >>>>>> aljoscha@apache.org
> > >>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> I also don't like big changes but sometimes
they are
> necessary.
> > >>>>>> The
> > >>>>>>>>>> reason
> > >>>>>>>>>>> why I'm so adamant about out-of-order
processing is that
> > >>>>>>> out-of-order
> > >>>>>>>>>>> elements are not some exception that
occurs once in a while;
> > >>>> they
> > >>>>>>> occur
> > >>>>>>>>>>> constantly in a distributed system.
For example, in this:
> > >>>>>>>>>>> https://gist.github.com/aljoscha/a367012646ab98208d27
the
> > >>>>>> resulting
> > >>>>>>>>>>> windows
> > >>>>>>>>>>> are completely bogus because the current
windowing system
> > >>>> assumes
> > >>>>>>>>>> elements
> > >>>>>>>>>>> to globally arrive in order, which
is simply not true. (The
> > >>>>>> example
> > >>>>>>>>> has a
> > >>>>>>>>>>> source that generates increasing integers.
Then these pass
> > >>>>>> through a
> > >>>>>>>>> map
> > >>>>>>>>>>> and are unioned with the original DataStream
before a window
> > >>>>>>> operator.)
> > >>>>>>>>>>> This simulates elements arriving from
different operators at
> a
> > >>>>>>>>> windowing
> > >>>>>>>>>>> operator. The example is also DOP=1,
I imagine this to get
> > >>>> worse
> > >>>>>>> with
> > >>>>>>>>>>> higher DOP.
> > >>>>>>>>>>>
> > >>>>>>>>>>> What do you mean by costly? As I said,
I have a
> > >>>> proof-of-concept
> > >>>>>>>>>> windowing
> > >>>>>>>>>>> operator that can handle out-or-order
elements. This is an
> > >>>> example
> > >>>>>>>>> using
> > >>>>>>>>>>> the current Flink API:
> > >>>>>>>>>>> https://gist.github.com/aljoscha/f8dce0691732e344bbe8.
> > >>>>>>>>>>> (It is an infinite source of tuples
and a 5 second window
> > >>>> operator
> > >>>>>>> that
> > >>>>>>>>>>> counts the tuples.) The first problem
is that this code
> > >>>> deadlocks
> > >>>>>>>>> because
> > >>>>>>>>>>> of the thread that emits fake elements.
If I disable the fake
> > >>>>>>> element
> > >>>>>>>>>> code
> > >>>>>>>>>>> it works, but the throughput using
my mockup is 4 times
> higher
> > >>>> .
> > >>>>>> The
> > >>>>>>>>> gap
> > >>>>>>>>>>> widens dramatically if the window size
increases.
> > >>>>>>>>>>>
> > >>>>>>>>>>> So, it actually increases performance
(unless I'm making a
> > >>>> mistake
> > >>>>>>> in
> > >>>>>>>>> my
> > >>>>>>>>>>> explorations) and can handle elements
that arrive
> out-of-order
> > >>>>>>> (which
> > >>>>>>>>>>> happens basically always in a real-world
windowing
> use-cases).
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Tue, 23 Jun 2015 at 12:51 Stephan
Ewen <sewen@apache.org>
> > >>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> What I like a lot about Aljoscha's
proposed design is that
> we
> > >>>>>>> need no
> > >>>>>>>>>>>> different code for "system time"
vs. "event time". It only
> > >>>>>>> differs in
> > >>>>>>>>>>> where
> > >>>>>>>>>>>> the timestamps are assigned.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The OOP approach also gives you
the semantics of total
> > >>>> ordering
> > >>>>>>>>> without
> > >>>>>>>>>>>> imposing merges on the streams.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Jun 23, 2015 at 10:43 AM,
Matthias J. Sax <
> > >>>>>>>>>>>> mjsax@informatik.hu-berlin.de>
wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> I agree that there should be
multiple alternatives the
> > >>>> user(!)
> > >>>>>>> can
> > >>>>>>>>>>>>> choose from. Partial out-of-order
processing works for
> > >>>>>> many/most
> > >>>>>>>>>>>>> aggregates. However, if you
consider
> > >>>> Event-Pattern-Matching,
> > >>>>>>> global
> > >>>>>>>>>>>>> ordering in necessary (even
if the performance penalty
> > >>>> might
> > >>>>>> be
> > >>>>>>>>>> high).
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I would also keep "system-time
windows" as an alternative
> > >>>> to
> > >>>>>>>>> "source
> > >>>>>>>>>>>>> assigned ts-windows".
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> It might also be interesting
to consider the following
> > >>>> paper
> > >>>>>> for
> > >>>>>>>>>>>>> overlapping windows: "Resource
sharing in continuous
> > >>>>>>> sliding-window
> > >>>>>>>>>>>>> aggregates"
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> https://dl.acm.org/citation.cfm?id=1316720
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On 06/23/2015 10:37 AM, Gyula
Fóra wrote:
> > >>>>>>>>>>>>>> Hey
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I think we should not block
PRs unnecessarily if your
> > >>>>>>> suggested
> > >>>>>>>>>>> changes
> > >>>>>>>>>>>>>> might touch them at some
point.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Also I still think we should
not put everything in the
> > >>>>>>> Datastream
> > >>>>>>>>>>>> because
> > >>>>>>>>>>>>>> it will be a huge mess.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Also we need to agree on
the out of order processing,
> > >>>>>> whether
> > >>>>>>> we
> > >>>>>>>>>> want
> > >>>>>>>>>>>> it
> > >>>>>>>>>>>>>> the way you proposed it(which
is quite costly). Another
> > >>>>>>>>> alternative
> > >>>>>>>>>>>>>> approach there which fits
in the current windowing is to
> > >>>>>>> filter
> > >>>>>>>>> out
> > >>>>>>>>>>> if
> > >>>>>>>>>>>>>> order events and apply
a special handling operator on
> > >>>> them.
> > >>>>>>> This
> > >>>>>>>>>>> would
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>> fairly lightweight.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> My point is that we need
to consider some alternative
> > >>>>>>> solutions.
> > >>>>>>>>>> And
> > >>>>>>>>>>> we
> > >>>>>>>>>>>>>> should not block contributions
along the way.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>> Gyula
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Tue, Jun 23, 2015 at
9:55 AM Aljoscha Krettek <
> > >>>>>>>>>>> aljoscha@apache.org>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The reason I posted
this now is that we need to think
> > >>>> about
> > >>>>>>> the
> > >>>>>>>>>> API
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>>> windowing before proceeding
with the PRs of Gabor
> > >>>> (inverse
> > >>>>>>>>> reduce)
> > >>>>>>>>>>> and
> > >>>>>>>>>>>>>>> Gyula (removal of "aggregate"
functions on DataStream).
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> For the windowing,
I think that the current model does
> > >>>> not
> > >>>>>>> work
> > >>>>>>>>>> for
> > >>>>>>>>>>>>>>> out-of-order processing.
Therefore, the whole windowing
> > >>>>>>>>>>> infrastructure
> > >>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>> basically have to be
redone. Meaning also that any work
> > >>>> on
> > >>>>>>> the
> > >>>>>>>>>>>>>>> pre-aggregators or
optimizations that we do now becomes
> > >>>>>>> useless.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> For the API, I proposed
to restructure the interactions
> > >>>>>>> between
> > >>>>>>>>>> all
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> different *DataStream
classes and grouping/windowing.
> > >>>> (See
> > >>>>>>> API
> > >>>>>>>>>>> section
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>> the doc I posted.)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Mon, 22 Jun 2015
at 21:56 Gyula Fóra <
> > >>>>>> gyula.fora@gmail.com
> > >>>>>>>>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi Aljoscha,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks for the
nice summary, this is a very good
> > >>>>>> initiative.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I added some comments
to the respective sections
> > >>>> (where I
> > >>>>>>> didnt
> > >>>>>>>>>>> fully
> > >>>>>>>>>>>>>>> agree
> > >>>>>>>>>>>>>>>> :).).
> > >>>>>>>>>>>>>>>> At some point I
think it would be good to have a public
> > >>>>>>> hangout
> > >>>>>>>>>>>> session
> > >>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>> this, which could
make a more dynamic discussion.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>> Gyula
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Aljoscha Krettek
<aljoscha@apache.org> ezt írta
> > >>>> (időpont:
> > >>>>>>>>> 2015.
> > >>>>>>>>>>> jún.
> > >>>>>>>>>>>>>>> 22.,
> > >>>>>>>>>>>>>>>> H, 21:34):
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>> with people
proposing changes to the streaming part I
> > >>>>>> also
> > >>>>>>>>>> wanted
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>>>>> throw
> > >>>>>>>>>>>>>>>>> my hat into
the ring. :D
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> During the
last few months, while I was getting
> > >>>>>> acquainted
> > >>>>>>>>> with
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> streaming system,
I wrote down some thoughts I had
> > >>>> about
> > >>>>>>> how
> > >>>>>>>>>>> things
> > >>>>>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>> be improved.
Hopefully, they are in somewhat coherent
> > >>>>>> shape
> > >>>>>>>>> now,
> > >>>>>>>>>>> so
> > >>>>>>>>>>>>>>>> please
> > >>>>>>>>>>>>>>>>> have a look
if you are interested in this:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> https://docs.google.com/document/d/1rSoHyhUhm2IE30o5tkR8GEetjFvMRMNxvsCfoPsW6_4/edit?usp=sharing
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> This mostly
covers:
> > >>>>>>>>>>>>>>>>>  - Timestamps
assigned at sources
> > >>>>>>>>>>>>>>>>>  - Out-of-order
processing of elements in window
> > >>>>>> operators
> > >>>>>>>>>>>>>>>>>  - API design
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Please let
me know what you think. Comment in the
> > >>>>>> document
> > >>>>>>> or
> > >>>>>>>>>> here
> > >>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> mailing list.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I have a PR
in the makings that would introduce source
> > >>>>>>>>>> timestamps
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> watermarks
for keeping track of them. I also hacked a
> > >>>>>>>>>>>> proof-of-concept
> > >>>>>>>>>>>>>>>> of a
> > >>>>>>>>>>>>>>>>> windowing system
that is able to process out-of-order
> > >>>>>>> elements
> > >>>>>>>>>>>> using a
> > >>>>>>>>>>>>>>>>> FlatMap operator.
(It uses panes to perform efficient
> > >>>>>>>>>>>>>>> pre-aggregations.)
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>> Aljoscha
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message