apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@apache.org>
Subject Join support in Malhar
Date Thu, 27 Apr 2017 06:14:01 GMT
Hi Community,

Currently the support for join in Malhar is little fuzzy for the end user.
We have multiple implementations -

   1. Join Impl 1 - Inner Join implementation, based on Managed state
   2. Join Impl 2 - Merge operator, Windowed implementation, based on
   Spillable structures (based on managed state)

Following are the differences between the two:

   - As the name implies, Join Impl 1 is meant for inner joins, while Join
   Impl 2 has generic support for inner as well as outer joins.
   - Join Impl 1 supports sliding time windows with support for expiring
   old tuples. Join Impl 2 needs understanding of windowing concepts and uses
   watermarking support for functioning.
   - By looking at the implementations of managed state used by Join Impl 1
   and Join Impl 2, it seems like Join Impl 1 would have a performance
   advantage over Join Impl 2.

The purpose of this email is to see what can be done to simplify the join
usability in Malhar. Following are some options:

   1. Keep both implementations with clear documentation of the usability
   for both.
   2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to improve
   performance. Note that even though Join Impl 1 addresses a very specific
   use case, it is the most common requirement in streaming join use cases.
   3. Any other option?


~ Bhupesh


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message