apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Join support in Malhar
Date Tue, 09 May 2017 15:17:43 GMT
​Looks like it would be okay to remove Join Impl 1 from Malhar.
The windowed merge implementation can be worked on and simplified to
address simpler use cases and ease of use.

Before proceeding with this, would be good to hear what other community
members think.
Will proceed with creating the JIRAs and PR if there is no response in a
couple of days.

~ Bhupesh
​


_______________________________________________________

Bhupesh Chawda

E: bhupesh@datatorrent.com | Twitter: @bhupeshsc

www.datatorrent.com  |  apex.apache.org



On Sat, May 6, 2017 at 11:07 PM, Thomas Weise <thw@apache.org> wrote:

> -->
>
> On Wed, May 3, 2017 at 2:59 AM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
>
> > The main difference is in the implementations of managed state that are
> > used in the two join impls.
> > The advantage mainly comes from the fact that Join impl 1 uses
> > ManagedTimeStateImpl (key buckets + time buckets) while Join impl 2 is
> > based on the other two implementations (both with the notion of either a
> > key or a time bucket).
> >
>
> How does it affect performance and scalability? I think that's the key
> question it comes down to.
>
>
>
> >
> > I agree that the windowed version addresses a more generic usecase. My
> only
> > concern was are there use cases / user communities which are not familiar
> > with the windowed semantics and might prefer the other implementation
> > instead? Would that warrant keeping the other implementation around?
> >
>
> It should be possible to create a module or wrapper if the intention is to
> simplify a specific use case?
>
>
> >
> >
> >
> >
> > On Fri, Apr 28, 2017 at 10:09 AM, Thomas Weise <thw@apache.org> wrote:
> >
> > > There is one more important difference not mentioned:
> > >
> > > Join Impl 1 doesn't work and Join Impl 2 does :)
> > >
> > > Can you clarify why a (working) Join Impl 1 would perform better? And
> if
> > it
> > > is the case, how the amount of work fixing 1 would stack up against
> > > improving 2?
> > >
> > > Join Impl 2 has greater flexibility due to the generalized windowing.
> If
> > > everything else is same I prefer we put our efforts there.
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > >
> > > On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhupesh@apache.org>
> > > wrote:
> > >
> > > > Hi Community,
> > > >
> > > > Currently the support for join in Malhar is little fuzzy for the end
> > > user.
> > > > We have multiple implementations -
> > > >
> > > >    1. Join Impl 1 - Inner Join implementation, based on Managed state
> > > >    2. Join Impl 2 - Merge operator, Windowed implementation, based on
> > > >    Spillable structures (based on managed state)
> > > >
> > > > Following are the differences between the two:
> > > >
> > > >    - As the name implies, Join Impl 1 is meant for inner joins, while
> > > Join
> > > >    Impl 2 has generic support for inner as well as outer joins.
> > > >    - Join Impl 1 supports sliding time windows with support for
> > expiring
> > > >    old tuples. Join Impl 2 needs understanding of windowing concepts
> > and
> > > > uses
> > > >    watermarking support for functioning.
> > > >    - By looking at the implementations of managed state used by Join
> > > Impl 1
> > > >    and Join Impl 2, it seems like Join Impl 1 would have a
> performance
> > > >    advantage over Join Impl 2.
> > > >
> > > > The purpose of this email is to see what can be done to simplify the
> > join
> > > > usability in Malhar. Following are some options:
> > > >
> > > >    1. Keep both implementations with clear documentation of the
> > usability
> > > >    for both.
> > > >    2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to
> > improve
> > > >    performance. Note that even though Join Impl 1 addresses a very
> > > specific
> > > >    use case, it is the most common requirement in streaming join use
> > > cases.
> > > >    3. Any other option?
> > > >
> > > > Thanks.
> > > >
> > > > ~ Bhupesh
> > > >
> > > > ​​
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message