storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsha <st...@harsha.io>
Subject Re: Urgent - Some workers stop processing after a few seconds
Date Wed, 25 Feb 2015 16:40:31 GMT

My bad was looking at another supervisor.log. There are no errors in
supervisor and worker logs.

-Harsha

On Wed, Feb 25, 2015, at 08:29 AM, Martin Illecker wrote:
> Hi Harsha,
>
> I'm using three c3.4xlarge EC2 instances: 1) Nimbus, WebUI, Zookeeper,
> Supervisor 2) Zookeeper, Supervisor 3) Zookeeper, Supervisor
>
> I cannot find this error message in my attached supervisor log? By the
> way, I'm running on Ubuntu EC2 nodes and there is no path C:\.
>
> I have not made any changes in these timeout values. Should be the
> default values: storm.zookeeper.session.timeout: 20000
> storm.zookeeper.connection.timeout: 15000
> supervisor.worker.timeout.secs: 30
>
> Thanks! Best regards Martin
>
>
> 2015-02-25 17:03 GMT+01:00 Harsha <storm@harsha.io>:
>> __
>> Hi Martin, Can you share your storm.zookeeper.session.timeout and
>> storm.zookeeper.connection.timeout and
>> supervisor.worker.timeout.secs. By looking at the supervisor logs I
>> see Error when processing event java.io.FileNotFoundException: File
>> 'c:\hdistorm\workers\f3e70029-c5c8-4f55-a4a1-396096b37509\heartbeats\1417082031858'
>> you might be running into
>> https://issues.apache.org/jira/browse/STORM-682 Is your zookeeper
>> cluster on a different set of nodes and can you check you are able to
>> connect to it without any issues -Harsha
>>
>>
>>
>> On Wed, Feb 25, 2015, at 03:49 AM, Martin Illecker wrote:
>>> Hi,
>>>
>>> I'm still observing this strange issue. Two of three workers stop
>>> processing after a few seconds. (each worker is running on one
>>> dedicated EC2 node)
>>>
>>> My guess would be that the output stream of one spout is not
>>> properly distributed over all three workers. Or somehow directed to
>>> one worker only? But *shuffleGrouping* should guarantee equal
>>> distribution among multiple bolts right?
>>>
>>> I'm using the following topology:
>>>
>>> TopologyBuilder builder = new TopologyBuilder();


>>> builder.setSpout("dataset-spout", spout);


>>> builder.setBolt("tokenizer-bolt", tokenizerBolt, 3).shuffleGrouping(


>>> "dataset-spout");


>>> builder.setBolt("preprocessor-bolt", preprocessorBolt,
>>> 3).shuffleGrouping(


>>> "tokenizer-bolt");


>>> conf.setMaxSpoutPending(2000);


>>> conf.setNumWorkers(3);


>>> StormSubmitter


>>> .submitTopology(TOPOLOGY_NAME, conf, builder.createTopology());


>>>
>>> I have attached the screenshots of the topology and the truncated
>>> worker and supervisor log of one idle worker.
>>>
>>> The supervisor log includes a few interesting lines, but I think
>>> they are normal? supervisor [INFO]
>>> e76bc338-2ba5-444b-9854-bca94f9587b7 still hasn't started


>>>
>>> I hope, someone can help me with this issue!
>>>
>>> Thanks Best regards Martin
>>>
>>>
>>> 2015-02-24 20:37 GMT+01:00 Martin Illecker <millecker@apache.org>:
>>>> Hi,
>>>>
>>>> I'm trying to run a topology on EC2, but I'm observing the
>>>> following strange issue:
>>>>
>>>> Some workers stop processing after a few seconds, without any error
>>>> in the worker log.
>>>>
>>>> For example, my topology consists of 3 workers and each worker is
>>>> running on its own EC2 node. Two of them stop processing after a
>>>> few seconds. But they have already processed several tuples
>>>> successfully.
>>>>
>>>> I'm using only one spout and shuffleGrouping at all bolts. If I add
>>>> more spouts then all workers keep processing, but the performance
>>>> is very bad.
>>>>
>>>> Does anyone have a guess why this happens?
>>>>
>>>> The topology is currently running at: http://54.155.156.203:8080
>>>>
>>>> Thanks!
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>
>>> Email had 4 attachments:


>>>  * topology.jpeg 161k (image/jpeg)
>>>  * component.jpeg 183k (image/jpeg)
>>>  * supervisor.log 7k (application/octet-stream)
>>>  * worker.log 37k (application/octet-stream)
>>
>


Mime
View raw message