ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srimanth Gunturi <sgunt...@hortonworks.com>
Subject Re: Flume - always unable to stop 2 flume agents
Date Wed, 18 May 2016 16:11:59 GMT
Hello,

If the Flume agents are receiving the shutdown request and they are not doing so, I would
suggest discussing this on the Flume mailing lists at https://flume.apache.org/mailinglists.html?

Regards,

Srimanth




________________________________
From: cs user <acldstkusr@gmail.com>
Sent: Wednesday, May 18, 2016 12:54 AM
To: user@ambari.apache.org
Subject: Re: Flume - always unable to stop 2 flume agents

Hi Srimanth,

Thanks for responding. I've checked the logs and it seems that the shutdown event is received,
and it is closed for some channels (we have 3 channels) but it just continues to run. For
example I can see entries like:

18 May 2016 08:13:21,142 INFO  [agent-shutdown-hook] (com.aweber.flume.source.rabbitmq.RabbitMQSource.stop:117)
 - Stopping channel1-source
18 May 2016 08:13:21,142 INFO  [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:149)
 - Component type: SOURCE, name: channel1-source stopped

But it looks like it continues to process events. I can see entries like this repeated over
and over, you can see this is around 30 mins after it tried to stop:

18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.run:143)
 - Attributes for component SOURCE.channel1-source
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - EventReceivedCount = 36417
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - AppendBatchAcceptedCount = 0
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - EventAcceptedCount = 36417
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - AppendReceivedCount = 0
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - StartTime = 1463486595420
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - AppendAcceptedCount = 0
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - OpenConnectionCount = 2
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - AppendBatchReceivedCount = 0
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - StopTime = 1463555601142

Is this normal behavior?

We are using this plugin:

https://github.com/aweber/rabbitmq-flume-plugin

I have thought about switching to this plugin:

https://github.com/jcustenborder/flume-ng-rabbitmq

To see if the problem goes away.

Thanks!






On Tue, May 17, 2016 at 5:29 PM, Srimanth Gunturi <sgunturi@hortonworks.com<mailto:sgunturi@hortonworks.com>>
wrote:

?Hello,

Could you please describe the setup a little bit more? Are 12 flume agents on 12 different
hosts or on a single host?

Also, have you looked at the flume logs for the those 2 agents to determine what is going
on during the 45 minutes?

Regards,

Srimanth


________________________________
From: cs user <acldstkusr@gmail.com<mailto:acldstkusr@gmail.com>>
Sent: Tuesday, May 17, 2016 4:44 AM
To: user@ambari.apache.org<mailto:user@ambari.apache.org>
Subject: Flume - always unable to stop 2 flume agents

Hello,

We have 12 flume agents. Whenever we change the config and need to restart the affected nodes,
we always end up with 2 flume agents which refuse to stop, it takes multiple attempts (sometimes
this takes as long as 45 mins) to eventually stop the agents. You have to keep trying to restart
them.

Has anyone else seen this? Is there a work around?

Thanks!


Mime
View raw message