apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Join support in Malhar
Date Wed, 03 May 2017 09:59:24 GMT
The main difference is in the implementations of managed state that are
used in the two join impls.
The advantage mainly comes from the fact that Join impl 1 uses
ManagedTimeStateImpl (key buckets + time buckets) while Join impl 2 is
based on the other two implementations (both with the notion of either a
key or a time bucket).

I agree that the windowed version addresses a more generic usecase. My only
concern was are there use cases / user communities which are not familiar
with the windowed semantics and might prefer the other implementation
instead? Would that warrant keeping the other implementation around?

~ Bhupesh


_______________________________________________________

Bhupesh Chawda

E: bhupesh@datatorrent.com | Twitter: @bhupeshsc

www.datatorrent.com  |  apex.apache.org



On Fri, Apr 28, 2017 at 10:09 AM, Thomas Weise <thw@apache.org> wrote:

> There is one more important difference not mentioned:
>
> Join Impl 1 doesn't work and Join Impl 2 does :)
>
> Can you clarify why a (working) Join Impl 1 would perform better? And if it
> is the case, how the amount of work fixing 1 would stack up against
> improving 2?
>
> Join Impl 2 has greater flexibility due to the generalized windowing. If
> everything else is same I prefer we put our efforts there.
>
> Thanks,
> Thomas
>
>
>
> On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhupesh@apache.org>
> wrote:
>
> > Hi Community,
> >
> > Currently the support for join in Malhar is little fuzzy for the end
> user.
> > We have multiple implementations -
> >
> >    1. Join Impl 1 - Inner Join implementation, based on Managed state
> >    2. Join Impl 2 - Merge operator, Windowed implementation, based on
> >    Spillable structures (based on managed state)
> >
> > Following are the differences between the two:
> >
> >    - As the name implies, Join Impl 1 is meant for inner joins, while
> Join
> >    Impl 2 has generic support for inner as well as outer joins.
> >    - Join Impl 1 supports sliding time windows with support for expiring
> >    old tuples. Join Impl 2 needs understanding of windowing concepts and
> > uses
> >    watermarking support for functioning.
> >    - By looking at the implementations of managed state used by Join
> Impl 1
> >    and Join Impl 2, it seems like Join Impl 1 would have a performance
> >    advantage over Join Impl 2.
> >
> > The purpose of this email is to see what can be done to simplify the join
> > usability in Malhar. Following are some options:
> >
> >    1. Keep both implementations with clear documentation of the usability
> >    for both.
> >    2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to improve
> >    performance. Note that even though Join Impl 1 addresses a very
> specific
> >    use case, it is the most common requirement in streaming join use
> cases.
> >    3. Any other option?
> >
> > Thanks.
> >
> > ~ Bhupesh
> >
> > ​​
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message