apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vlad Rozov <v.rozo...@gmail.com>
Subject Re: Difference between setup() and activate()
Date Fri, 04 Aug 2017 11:39:23 GMT
This recommendation to use activate() over setup() is questionable with 
the introduction of the back pressure. In a distributed streaming 
application operators need to handle downstream downtime, difference 
between upstream and downstream throughput, busy output ports and back 
pressure. A few hundreds milliseconds difference between setup() and 
activate() is not something that I would be concerned as an operator 
developer once the above conditions are handled.

Thank you,


On 8/3/17 15:37, Pramod Immaneni wrote:
> Yes activate is called closer to start of tuple processing as far as apex
> is concerned, so if you are doing things like writing an input operator
> that does asynchronous processing where you will start receiving data as
> soon as you open a connection to your external source it is better to do it
> in activate to reduce latency and buffer build up.
> On Thu, Aug 3, 2017 at 3:07 PM, Vlad Rozov <v.rozov64@gmail.com> wrote:
>> Correct, both setup() and activate() are called when an operator is
>> restored from a checkpoint. When an operator is restored from a checkpoint
>> it is considered to be a new instance/deployment of an operator with it's
>> state reset to a checkpoint. In this case Apex core gives an operator a
>> chance to initialize transient fields both in setup() or activate().
>> I am not aware of any use case where platform will go through
>> activate/deactivate cycle without setup/teardown, but such code path may be
>> introduced in the future (for example it may be used to manage an input
>> operator with high emit rate). It is better not to make any assumptions on
>> how many times activate/deactivate may be called.
>> Currently the main difference between setup() and activate() is described
>> in the java doc for ActivationListener:
>> * An example of where one would consider implementing ActivationListener
>> is an * input operator which wants to consume a high throughput stream.
>> Since there is * typically at least a few hundreds of milliseconds between
>> the time the setup method * is called and the first window, you would want
>> to place the code to activate the * stream inside activate instead of setup.
>> My recommendation is to use setup() to initialize transient fields unless
>> you need to deal with the above case.
>> Thank you,
>> Vlad
>> On 8/2/17 13:31, Ananth G wrote:
>>> Hello Vlad,
>>> Thanks for your response.
>>> Do you refer to restoring from a checkpoint as serialize/deserialize
>>>>> cycles?
>>>> Yes.
>>> In case of restoring from a checkpoint (deserialization) setup() is a
>>>>> part of a redeployment request, AFAIK.
>>>> This sounds a bit in contradiction to the response from Sanjay in the
>>> mail thread below. I tried to quickly glance in the apex-core code and it
>>> looks like both are being called ( Perhaps I am entirely wrong on this as
>>> it was only a quick scan). I was referring to the code in
>>> StreamingContainer.java in the engine package and the method called
>>> deploy().
>>> Please see ActivationListener javadoc for details when it is necessary to
>>>>> use activate() vs setup().
>>>> I had to raise this question in the mail after going through the
>>> javadoc. The javadoc is a bit cryptic in this scenario of
>>> serialise/deserialize. Also the javadoc is not clear as to what we meant by
>>> activate/deactivate being called multiple times whereas setup is called
>>> once in a lifetime of the operator. If the setup is called once in lifetime
>>> of an operator per javadoc, did it mean once in the lifetime of the JVM
>>> instantiating via the constructor or across the deserialise cycles of the
>>> passivated operator state ? If it is once across all passivated instances
>>> of the operator, then setup() would not be called multiple times and hence
>>> not a great location for transient variables ? If setup() is called across
>>> deserialise cycles, then I find it more confusing as to why we need setup()
>>> and activate() methods almost having the same functionality.
>>> Thoughts ?
>>> Regards,
>>> Ananth
>>> On 1 Aug 2017, at 3:38 am, Vlad Rozov <v.rozov@datatorrent.com> wrote:
>>>> Do you refer to restoring from a checkpoint as serialize/deserialize
>>>> cycles? There are no calls to setup/teardown and/or activate/deactivate
>>>> during checkpointing/serialization. In case of restoring from a checkpoint
>>>> (deserialization) setup() is a part of a redeployment request, AFAIK. The
>>>> best answer to question 3 is it depends. In most cases using setup() to
>>>> resolve all transient field is as good as doing that in activate(). Please
>>>> see ActivationListener javadoc for details when it is necessary to use
>>>> activate() vs setup().
>>>> Thank you,
>>>> Vlad
>>>> On 7/29/17 19:58, Sanjay Pujare wrote:
>>>>> The Javadoc comment
>>>>> for com.datatorrent.api.Operator.ActivationListener<CONTEXT>  (in
>>>>> https://github.com/apache/apex-core/blob/master/api/src/main
>>>>> /java/com/datatorrent/api/Operator.java)
>>>>> should hopefully answer your questions.
>>>>> Specifically:
>>>>> 1. No, setup() is called only once in the entire lifetime (
>>>>> http://apex.apache.org/docs/apex/operator_development/#setup-call)
>>>>> 2. Yes. When an operator is "activated" - first time in its life or
>>>>> reactivation after a failover -  actuvate() is called before the first
>>>>> beginWindow() is called.
>>>>> 3. Yes.
>>>>> On Sun, Jul 30, 2017 at 12:18 AM, Ananth G <ananthg.apex@gmail.com>
>>>>> wrote:
>>>>> Hello All,
>>>>>> I was looking at the documentation and could not get a clear
>>>>>> distinction
>>>>>> of behaviours for setup() and activate() during scenarios when an
>>>>>> operator
>>>>>> is passivated ( ex: application shutdown, repartition use cases )
>>>>>> being
>>>>>> brought back to life again. Could someone from the community advise
>>>>>> on
>>>>>> the following questions ?
>>>>>> 1. Is setup() called in these scenarios (serialize/deserialize cycles)
>>>>>> as
>>>>>> well ?
>>>>>> 2. I am assuming activate() is called in these scenarios ? - The
>>>>>> javadoc
>>>>>> for activation states that the activate() can be called multiple
>>>>>> (
>>>>>> without explicitly stating why ) and my assumption is that it is
>>>>>> because of
>>>>>> these scenarios.
>>>>>> 3. If setup() is only called once during the lifetime of an operator
>>>>>> is
>>>>>> it fair to assume that activate() is the best place to resolve all
>>>>>> the
>>>>>> transient fields of an operator ?
>>>>>> Regards,
>>>>>> Ananth

View raw message