storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stig Rohde Døssing <stigdoess...@gmail.com>
Subject Re: Storm 2.0.0 Local Cluster worker restart
Date Wed, 28 Aug 2019 14:30:30 GMT
Yes, you are right, that is better.

Regarding tests, there's an AsyncLocalizerTest that stubs the blob store,
and makes some assertions about download behavior. Maybe that would be a
good fit?

Den ons. 28. aug. 2019 kl. 14.30 skrev Diogo Monteiro <
diogo.monteiro12@gmail.com>:

> Hi Stig,
> Thanks for the reply.
>
> Makes sense. But isn't it probably better if we just explicitly create an
> empty resources directory (*Files.createDirectories(extractionDest)* is
> enough) in an else clause, instead of calling *extractDirFromJar*?
>
> I say this because if we don't have a resources jar or a classpath url, we
> aren't really extracting anything from a jar.
>
> Another thing, what's your opinion on how I should test this?
> *LocallyCachedTopologyBlob* do not have unit tests.
>
> Diogo.
>
> On Tue, Aug 27, 2019 at 5:32 PM Stig Rohde Døssing <stigdoessing@gmail.com>
> wrote:
>
>> Hi Diogo,
>>
>> Thanks for your thorough explanation. I think you are right, and this is
>> a bug. We'd be happy to see a PR to fix this.
>>
>> I think a decent way to handle this could be adding an extra else clause
>> to
>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L146,
>> and simply create an empty resources directory in the blob extraction
>> directory, by calling extractDirFromJar(resourcesJar, ServerConfigUtils.
>> RESOURCES_SUBDIR, extractionDest);. This is just me spitballing, so
>> please feel free to fix it some other way if you have a better idea.
>>
>> Den tir. 27. aug. 2019 kl. 14.50 skrev Diogo Monteiro <
>> diogo.monteiro12@gmail.com>:
>>
>>> Hi all,
>>>
>>> My name is Diogo and I am a dev for Paddy Power Betfair in Porto,
>>> Portugal. We're running Storm 1.x.x in production for a couple of years and
>>> the time has come for us to upgrade to 2.0.0. We use *LocalCluster* to
>>> run topologies in our local machines to perform manually tests.
>>>
>>> So, going to the point: I was trying to launch a topology that I'm
>>> developing (in 2.0.0) and noticed that the worker was getting restarted
>>> each ~30 seconds.
>>> I placed a breakpoint in the *kill* method of *LocalContainer* (
>>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66)
>>> to try and understand why the worker was getting restarted.
>>>
>>> The call stack was:
>>>
>>> kill:66, LocalContainer (org.apache.storm.daemon.supervisor)
>>> killContainerFor:269, Slot (org.apache.storm.daemon.supervisor)
>>> handleRunning:724, Slot (org.apache.storm.daemon.supervisor)
>>> stateMachineStep:218, Slot (org.apache.storm.daemon.supervisor)
>>> run:931, Slot (org.apache.storm.daemon.supervisor)
>>>
>>>
>>> With this I can understand that the worker is killed because a blob has
>>> changed (
>>> https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724).
>>> In fact, there's a changing blob in the *dynamicState* at that point.
>>>
>>> I checked the *AsyncLocalizer *which downloads, caches blobs locally,
>>> and notifies the Slot state machine of a changing blob.
>>>
>>> I noticed this:
>>>
>>>    -
>>>    https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339
>>>    -
>>>    https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265
>>>    -
>>>    https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142
>>>    -
>>>    https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192
>>>
>>>
>>> Which tell me that (correct me if I'm wrong):
>>>
>>>    - Supervisor tries to update blobs each 30 seconds.
>>>    - The topology jar blob requires extraction of the resources
>>>    directory (either from a jar or directly in a classpath URL). It does so in
>>>    *fetchUnzipToTemp *and it's existence is checked in
>>>    *isFullyDownloaded*.
>>>    - The Slot is notified of a changing blob if:
>>>       - the remote version is different from the local version (the
>>>       code has changed).
>>>       - OR the blob is not fully downloaded (the jar exists, and the
>>>       extracted resources directory exists).
>>>
>>> Well, I did not have a resources folder under the root of the classpath,
>>> and that's why the worker was being restarted each ~30 seconds, as the Slot
>>> was being notified of a changing blob everytime *updateBlobs* ran.
>>> I created a resources folder (with dummy files) under the root of the
>>> classpath and the problem is now solved.
>>>
>>> However, if I understand correctly, the resources folder is only
>>> required for *multilang*. Our topologies do not use *multilang *and
>>> this do not happen in Storm 1.1.3 for instance.
>>>
>>> Am I seeing or doing something wrong and this is an expected behaviour?
>>> I am happy to contribute if this is in fact something worth to open an
>>> issue and fix.
>>>
>>> Hope this is the right place for these questions, and thanks in advance
>>> for taking your time to look at this.
>>>
>>> Regards,
>>> Diogo
>>>
>>

Mime
View raw message