storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsha <st...@harsha.io>
Subject Re: Urgent - Some workers stop processing after a few seconds
Date Wed, 25 Feb 2015 16:03:06 GMT

Hi Martin, Can you share your storm.zookeeper.session.timeout and
storm.zookeeper.connection.timeout and supervisor.worker.timeout.secs.
By looking at the supervisor logs I see Error when processing event
java.io.FileNotFoundException: File
'c:\hdistorm\workers\f3e70029-c5c8-4f55-a4a1-396096b37509\heartbeats\1417082031858'
you might be running into
https://issues.apache.org/jira/browse/STORM-682 Is your zookeeper
cluster on a different set of nodes and can you check you are able to
connect to it without any issues -Harsha



On Wed, Feb 25, 2015, at 03:49 AM, Martin Illecker wrote:
> Hi,
>
> I'm still observing this strange issue. Two of three workers stop
> processing after a few seconds. (each worker is running on one
> dedicated EC2 node)
>
> My guess would be that the output stream of one spout is not properly
> distributed over all three workers. Or somehow directed to one worker
> only? But *shuffleGrouping* should guarantee equal distribution among
> multiple bolts right?
>
> I'm using the following topology:
>
> TopologyBuilder builder = new TopologyBuilder();


> builder.setSpout("dataset-spout", spout);


> builder.setBolt("tokenizer-bolt", tokenizerBolt, 3).shuffleGrouping(


> "dataset-spout");


> builder.setBolt("preprocessor-bolt", preprocessorBolt,
> 3).shuffleGrouping(


> "tokenizer-bolt");


> conf.setMaxSpoutPending(2000);


> conf.setNumWorkers(3);


> StormSubmitter


> .submitTopology(TOPOLOGY_NAME, conf, builder.createTopology());


>
> I have attached the screenshots of the topology and the truncated
> worker and supervisor log of one idle worker.
>
> The supervisor log includes a few interesting lines, but I think they
> are normal? supervisor [INFO] e76bc338-2ba5-444b-9854-bca94f9587b7
> still hasn't started


>
> I hope, someone can help me with this issue!
>
> Thanks Best regards Martin
>
>
> 2015-02-24 20:37 GMT+01:00 Martin Illecker <millecker@apache.org>:
>> Hi,
>>
>> I'm trying to run a topology on EC2, but I'm observing the following
>> strange issue:
>>
>> Some workers stop processing after a few seconds, without any error
>> in the worker log.
>>
>> For example, my topology consists of 3 workers and each worker is
>> running on its own EC2 node. Two of them stop processing after a few
>> seconds. But they have already processed several tuples successfully.
>>
>> I'm using only one spout and shuffleGrouping at all bolts. If I add
>> more spouts then all workers keep processing, but the performance is
>> very bad.
>>
>> Does anyone have a guess why this happens?
>>
>> The topology is currently running at: http://54.155.156.203:8080
>>
>> Thanks!
>>
>> Martin
>>
>>
>>
>
> Email had 4 attachments:


>  * topology.jpeg 161k (image/jpeg)
>  * component.jpeg 183k (image/jpeg)
>  * supervisor.log 7k (application/octet-stream)
>  * worker.log 37k (application/octet-stream)


Mime
View raw message