samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Kumarasubramanian <bkumarasubraman...@linkedin.com>
Subject Re: [VOTE] SEP-8: Add in-memory system consumer & producer
Date Fri, 15 Sep 2017 20:57:02 GMT
Thanks for your feedback. Answers inline


On 9/14/17, 1:23 AM, "Yi Pan" <nickpan47@gmail.com> wrote:

    Hi, Bharath,
    
    Overall looks good! I have the following comments:
    
    i) Question on the Type of IME + data partition:
    
    How do we enforce that user adds IME w/ the expected partition id to the
    corresponding sub-collection?
    For IME as the data source, we will take a collection instead of collection of collection
since we know the partition information already.
    I will update the wiki to make it more clear and explicit. Let me know if this is acceptable?

    
    
    ii) In the architecture graph, what's the difference between SSP queues and
    Data source/sink? What is the layer exposed to the user (I.e. programmer)?
    SSP queues are intermediate buffers for the in-memory system to pass messages and are
not exposed to programmer.
    Data source/sink refers to the handle of input data provided by the end user and output
to which the system will flush the data for end user to access.
    
    
    ii) Agree w/ the approach to use a customized queues managed by the admin.
    However, the reason not to use BEM is not very clear. For the matter of
    fact, BEM is just one optional base class for SystemConsumer implementation.
    Not sure why we necessarily need to be limited by BEM.
    I agree BEM is just an optional helper class that has bunch of utility methods to implement
a SystemConsumer. Having to go down the approach will require the SystemProducer implementation
to have a reference to SystemConsumer for writing data into same buffer or one single implementation
to act as both consumer & producer. This isn’t a limitation but things we sign up for
if we go down with approaches using BEM. The benefits that come up  with BEM isn’t justified
for our use case and hence approach C.
    
    iii) In the code examples,
    
    A) what's the difference between durable state vs non-durable state in
    highlevel API? I don't see any difference. Also, the SEP has clearly
    described that the design is only for InMemory input/output/intermediate
    streams. I noticed that you added changelog as inputs in low-level API. But
    it is not clear how this changelog is defined and why it is an input to the
    application??? 
    The changelog is supposed to be wired through the StoreDescriptor. Since this is not supported
in V1, I will go ahead and remove the use case.
    I will add a section on use cases not supported and add these to them for book keeping
purpose so that we can revisit these for V2.
    
    B) the code example for checkpoint is empty and we have stated that we
    won't support checkpoint in this SEP. Can we remove it? Removed it.
    
    
    Thanks!
    
    
    -Yi
    
    On Wed, Sep 6, 2017 at 2:06 PM, xinyu liu <xinyuliu.us@gmail.com> wrote:
    
    > +1 on the overall design. This will make testing a lot easier!
    >
    > Thanks,
    > Xinyu
    >
    > On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
    > codin.martial@gmail.com> wrote:
    >
    > > Hi all,
    > >
    > > Can you please vote for SEP-8?
    > > You can find the design document here
    > > <https://cwiki.apache.org/confluence/pages/viewpage.
    > action?pageId=71013043
    > > >.
    > >
    > > Thanks,
    > > Bharath
    > >
    >
    

Mime
View raw message