apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav Gupta <gau...@datatorrent.com>
Subject Re: dynamic application properties proposal
Date Thu, 01 Oct 2015 18:13:43 GMT

The new special property change tuple will be send to all the Operators and all the operators
will have to check if the property change is applicable for it. Although such requests may
be very few, but is there a way to optimize it?

- Gaurav

> On Sep 28, 2015, at 3:44 PM, Pramod Immaneni <pramod@datatorrent.com> wrote:
> At the platform level that cannot be guaranteed as your operator controls
> and manages reading of the data. However it is not difficult to envision
> writing an operator that would pick up a new dataset when property is
> changed.
> On Mon, Sep 28, 2015 at 3:33 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>> Great, looking forward to these changes. Does it also provide a guarantee
>> on which properties are used for which input data sets?
>> Few use case examples:
>> - set property between reads of different batches of files. Say, applying
>> batch name property before processing the next batch of files.
>> - load new configuration file for csv parser before processing next set of
>> data.
>> - apply new regex before parsing next stream of tuples.
>> etc.
>> One approach to allow this is to emit subsequent tuples only starting next
>> window after the window in which property change is made. That way, the
>> boundaries between data sets is fixed and property change is done in
>> between. The user will now have a guarantee on which property value is used
>> on any given tuple.
>> Thoughts?
>> Regards,
>> Ashwin.
>> On Mon, Sep 28, 2015 at 10:17 AM, Pramod Immaneni <pramod@datatorrent.com>
>> wrote:
>>> Apex support modification of operator properties at runtime but the
>> current
>>> implemenations has the following shortcomings.
>>> 1. Property is not set across all partitions on the same window as
>>> individual partitions can be on different windows when property change is
>>> initiated from client resulting in inconsistency of data for those
>> windows.
>>> I am being generous using the word inconsistent.
>>> 2. Sometimes properties need to be set on more than one logical operators
>>> at the same time to achieve the change the user is seeking. Today they
>> will
>>> be two separate changes happening on two different windows again
>> resulting
>>> in inconsistent data for some windows. These would need to happen as a
>>> single transaction.
>>> 3. If there is an operator failure before a committed checkpoint after an
>>> operator property is dynamically changed the operator will restart with
>> the
>>> old property and the change will not be re-applied.
>>> Tim and myself did some brainstorming and we have a proposal to overcome
>>> these shortcomings. The main problem in all the above cases is that the
>>> property changes are happening out-of-band of data flow and hence
>>> independent of windowing. The proposal is to bring the property change
>>> request into the in-band dataflow so that they are handled consistently
>>> with windowing and handled distributively.
>>> The idea is to inject a special property change tuple containing the
>>> property changes and the identification information of the operator's
>> they
>>> affect into the dataflow at the input operator. The tuple will be
>> injected
>>> at window boundary after end window and before begin window and as this
>>> tuple flows through the DAG the intended operators properties will be
>>> modifed. They will all be modified consistently at the same window. The
>>> tuple can contain more than one property changes for more than one
>> logical
>>> operators and the change will be applied consistently to the different
>>> logical operators at the same window. In case of failure the replay of
>>> tuples will ensure that the property change gets reapplied at the correct
>>> window.
>>> Please give your feedback and input on what you think about this
>> proposal.
>>> Thanks
>> --
>> Regards,
>> Ashwin.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message