flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sagar Mehta <sagarme...@gmail.com>
Subject Re: Flume Ng replaying events when the source is idle
Date Tue, 05 Mar 2013 17:53:33 GMT
Hi Guys,

So I disabled puppet on one of the boxes and yes there have been no replays.

So essentially to summarize -

   - Puppet was re-writing the flume config file [expand a template] every
   30 mins.
   - This caused Flume to reload the configuration.
   - As a result the exec source was getting re-executed which caused the
   replay of events since the logFile had no changes. [From Flume's point of
   view these are still new events :)]

*@Mike - Your suggestion totally makes sense and I'm going to try that now
in a test environment. Do you recommend it for production use?*

Sagar

On Mon, Mar 4, 2013 at 7:10 PM, Mike Percy <mpercy@apache.org> wrote:

> Sagar,
> Just try "tail -F" on the same file over and over on the command line. It
> will display the last few lines.
>
> If you want to avoid this, try "tail -F -n 0 filename" and you should not
> see this. Every time you reload your configuration file, the specified
> command is re-executed by the source.
>
> Regards,
> Mike
>
>
>
> On Mon, Mar 4, 2013 at 4:13 PM, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>>  Flume will reload the configuration file every time it is modified.
>> Since puppet rewrites it, Flume reloads it. The events are probably
>> replayed because of the transactions being incomplete or something like
>> that. File Channel will not replay the events if they have been completely
>> persisted to HDFS and transaction closed. If pupper does not rewrite the
>> config file, do you see this issue?
>>
>> --
>> Hari Shreedharan
>>
>> On Monday, March 4, 2013 at 3:06 PM, Sagar Mehta wrote:
>>
>> I think we found the issue, not sure if this is the root cause but looks
>> highly correlated.
>>
>> So we manage configs using puppet which currently runs in a cron mode
>> with following configuration
>>
>> ## puppetrun Cron Job
>> 20,50 * * * *  root sleep $((RANDOM\%60)) > /dev/null 2>&1; puppet agent
>> --onetime --no-daemonize --logdest syslog > /dev/null 2>&1
>>
>>  *Note - the times at which puppet is run along with the time-stamps in
>> the listing below.*
>>
>> Also after combing through flume logs, we noticed Flume is reloading the
>> configuration after every puppet run
>>
>> sagar@drspock ~/temp $ cat flume.log.2013-03-03 | egrep -i "reloading" |
>> head -5
>> 2013-03-03 00:20:44,174 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>> 2013-03-03 00:51:14,374 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>> 2013-03-03 01:21:15,072 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>> 2013-03-03 01:51:15,778 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>> 2013-03-03 02:20:46,481 [conf-file-poller-0] INFO
>>  org.apache.flume.conf.properties.PropertiesFileConfigurationProvider -
>> Reloading configuration file:/opt/flume/conf/hdfs.conf
>>
>> The way we have our current setup, the flume config file
>> namely /opt/flume/conf/hdfs.conf is re-written after every puppet run due
>> to variable interpolation in the template.
>>
>>  *We are still not sure what is causing Flume to reload the config file,
>> and even if the file is reloaded why are the same events getting replayed
>> [the state should be saved somewhere on disk - thats what the file channel
>> is for I thought]*
>>
>> Any pointers/insights appreciated.
>>
>> Sagar
>>
>>
>> On Mon, Mar 4, 2013 at 2:42 PM, Sagar Mehta <sagarmehta@gmail.com> wrote:
>>
>> Guys,
>>
>> Yes this issue was also seen in the memory channel. In fact when we moved
>> to File based channel, we initially thought  this issue won't occur since
>> it stores check points.
>>
>> Anyways below are all files for collector110 [whose source didn't receive
>> any events] and you can see all the replays below. I have attached the
>> corresponding flume log file for the same day.
>>
>> hadoop@jobtracker301:/home/smehta$ hls
>> /ngpipes-raw-logs/2013-03-03/*/collector110* |  head -5
>> -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 00:20
>> /ngpipes-raw-logs/2013-03-03/0000/collector110.ngpipes.sac.ngmoco.com.1362270044367.gz
>> -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 00:51
>> /ngpipes-raw-logs/2013-03-03/0000/collector110.ngpipes.sac.ngmoco.com.1362271875065.gz
>> -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 01:21
>> /ngpipes-raw-logs/2013-03-03/0100/collector110.ngpipes.sac.ngmoco.com.1362273675770.gz
>> -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 01:51
>> /ngpipes-raw-logs/2013-03-03/0100/collector110.ngpipes.sac.ngmoco.com.1362275476474.gz
>> -rw-r--r--   3 hadoop supergroup       1594 2013-03-03 02:20
>> /ngpipes-raw-logs/2013-03-03/0200/collector110.ngpipes.sac.ngmoco.com.1362277246704.gz
>>
>> Also in the attached flume log, you can see the replay stuff I'm talking
>> about - Please note the source received no events during this time.
>>
>> sagar@drspock ~/temp $ cat flume.log.2013-03-03 | egrep -i "Queue Size
>> after replay" | head
>> 2013-03-03 00:20:44,355 [lifecycleSupervisor-1-3] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10
>> [channel=channel1]
>> 2013-03-03 00:20:44,356 [lifecycleSupervisor-1-4] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10
>> [channel=channel2]
>> 2013-03-03 00:51:14,571 [lifecycleSupervisor-1-7] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 0
>> [channel=channel2]
>> 2013-03-03 00:51:14,577 [lifecycleSupervisor-1-1] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10
>> [channel=channel1]
>> 2013-03-03 01:21:15,276 [lifecycleSupervisor-1-8] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 0
>> [channel=channel2]
>> 2013-03-03 01:21:15,281 [lifecycleSupervisor-1-7] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10
>> [channel=channel1]
>> 2013-03-03 01:51:15,979 [lifecycleSupervisor-1-9] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 0
>> [channel=channel2]
>> 2013-03-03 01:51:15,985 [lifecycleSupervisor-1-5] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10
>> [channel=channel1]
>> 2013-03-03 02:20:46,697 [lifecycleSupervisor-1-2] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10
>> [channel=channel1]
>> 2013-03-03 02:20:46,697 [lifecycleSupervisor-1-8] INFO
>>  org.apache.flume.channel.file.FileChannel - Queue Size after replay: 10
>> [channel=channel2]
>>
>> As for the contents of the file, yes they are exactly the same 10 lines
>> of events replayed over and over - I checked that.
>>
>> Let me know if you guys have any insights into this or if this is a bug
>> in Flume Ng.
>>
>> Sagar
>>
>>
>> On Thu, Feb 28, 2013 at 2:59 PM, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>  Can also send the flume agent logs? Did you check the contents of the
>> files?
>>
>> --
>> Hari Shreedharan
>>
>> On Thursday, February 28, 2013 at 2:43 PM, Roshan Naik wrote:
>>
>> would you be able to you verify if the same problem can be reproduced by
>> using the memory channel instead in a test setup ?
>>
>>
>> On Wed, Feb 27, 2013 at 11:37 AM, Sagar Mehta <sagarmehta@gmail.com>wrote:
>>
>> Hi Guys,
>>
>> I'm using Flume-Ng and it is working pretty well except for a weird
>> situation which I observed lately. In essence I'm using an exec source for
>> doing  tail -F on a logfile and using two HDFS sinks with a File channel.
>>
>> However I have observed that when the source [ logfile of a jetty based
>> collector] is idle - that is no new events are pushed to the logFile,
>> FlumeNg seems to replay the same set of events.
>>
>> For example collector110 received no events for 2 subsequent hours and
>> below are the corresponding Flume written files at the HDFS sink
>>
>> hadoop@jobtracker301:/home/hadoop/sagar$ hls
>> /ngpipes-raw-logs/2013-02-27/1400/collector110*
>> -rw-r--r--   3 hadoop supergroup        441 2013-02-27 14:20
>> /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361974853210.gz
>> -rw-r--r--   3 hadoop supergroup        441 2013-02-27 14:50
>> /ngpipes-raw-logs/2013-02-27/1400/collector110.ngpipes.sac.ngmoco.com.1361976653432.gz
>>
>> hadoop@jobtracker301:/home/hadoop/sagar$ hls
>> /ngpipes-raw-logs/2013-02-27/1500/collector110*
>> -rw-r--r--   3 hadoop supergroup        441 2013-02-27 15:20
>> /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361978454123.gz
>> -rw-r--r--   3 hadoop supergroup        441 2013-02-27 15:50
>> /ngpipes-raw-logs/2013-02-27/1500/collector110.ngpipes.sac.ngmoco.com.1361980254338.gz
>>
>> hadoop@jobtracker301:/home/hadoop/sagar$ md5sum *
>> c7360ef5c8deaee3ce9f4c92e9d9be63
>>  collector110.ngpipes.sac.ngmoco.com.1361974853210.gz
>> c7360ef5c8deaee3ce9f4c92e9d9be63
>>  collector110.ngpipes.sac.ngmoco.com.1361976653432.gz
>> c7360ef5c8deaee3ce9f4c92e9d9be63
>>  collector110.ngpipes.sac.ngmoco.com.1361978454123.gz
>> c7360ef5c8deaee3ce9f4c92e9d9be63
>>  collector110.ngpipes.sac.ngmoco.com.1361980254338.gz
>>
>> As you can see above the md5sums match.
>>
>> I'm using a File channel which has checkpoints, so I'm not sure what is
>> going on. Btw looks like the difference in timestamps of the two replays is
>> exactly 30 mins.
>>
>> *Is this a known bug or am I missing something?*
>> *
>> *
>> *Below is my Flume config file*
>>
>> smehta@collector110:/opt/flume/conf$ cat hdfs.conf
>> # An hdfs sink to write events to the hdfs on the test cluster
>> # A memory based channel to connect the above source and sink
>>
>> # Name the components on this agent
>> collector110.sources = source1
>> collector110.sinks = sink1 sink2
>> collector110.channels = channel1 channel2
>>
>> # Configure the source
>> collector110.sources.source1.type = exec
>> collector110.sources.source1.command = tail -F /opt/jetty/logFile.log
>>
>> # Configure the interceptors
>> collector110.sources.source1.interceptors = TimestampInterceptor
>> HostInterceptor
>>
>> # We use the Timestamp interceptor to get timestamps of when flume
>> receives events
>> # This is used for figuring out the bucket to which an event goes
>> collector110.sources.source1.interceptors.TimestampInterceptor.type =
>> timestamp
>>
>> # We use the Host interceptor to populate the host header with the fully
>> qualified domain name of the collector.
>> # That way we know which file in the sink respresents which collector.
>> collector110.sources.source1.interceptors.HostInterceptor.type =
>> org.apache.flume.interceptor.HostInterceptor$Builder
>> collector110.sources.source1.interceptors.HostInterceptor.preserveExisting
>> = false
>> collector110.sources.source1.interceptors.HostInterceptor.useIP = false
>> collector110.sources.source1.interceptors.HostInterceptor.hostHeader =
>> host
>>
>>
>> # Configure the sink
>>
>> collector110.sinks.sink1.type = hdfs
>>
>> # Configure the bucketing
>> collector110.sinks.sink1.hdfs.path=hdfs://
>> namenode3001.ngpipes.milp.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00
>>
>> # Prefix the file with the source so that we know where the events in the
>> file came from
>> collector110.sinks.sink1.hdfs.filePrefix = %{host}
>>
>> # We roll the flume output file based on time interval - currently every
>> 5 minutes
>> collector110.sinks.sink1.hdfs.rollSize = 0
>> collector110.sinks.sink1.hdfs.rollCount = 0
>> collector110.sinks.sink1.hdfs.rollInterval = 300
>>
>> #gzip compression related settings
>> collector110.sinks.sink1.hdfs.codeC = gzip
>> collector110.sinks.sink1.hdfs.fileType = CompressedStream
>> collector110.sinks.sink1.hdfs.fileSuffix = .gz
>>
>> # Configure the sink
>>
>> collector110.sinks.sink2.type = hdfs
>>
>> # Configure the bucketing
>> collector110.sinks.sink2.hdfs.path=hdfs://
>> namenode5001.ngpipes.sac.ngmoco.com:9000/ngpipes-raw-logs/%Y-%m-%d/%H00
>>
>> # Prefix the file with the source so that we know where the events in the
>> file came from
>> collector110.sinks.sink2.hdfs.filePrefix = %{host}
>>
>> # We roll the flume output file based on time interval - currently every
>> 5 minutes
>> collector110.sinks.sink2.hdfs.rollSize = 0
>> collector110.sinks.sink2.hdfs.rollCount = 0
>> collector110.sinks.sink2.hdfs.rollInterval = 300
>>
>> #gzip compression related settings
>> collector110.sinks.sink2.hdfs.codeC = gzip
>> collector110.sinks.sink2.hdfs.fileType = CompressedStream
>> collector110.sinks.sink2.hdfs.fileSuffix = .gz
>>
>> # Configure the channel that connects the source to the sink
>>
>> # Use a channel which buffers events in filesystem
>> collector110.channels.channel1.type = file
>> collector110.channels.channel1.checkpointDir =
>> /data/flume_data/channel1/checkpoint
>> collector110.channels.channel1.dataDirs = /data/flume_data/channel1/data
>>
>> # Use a channel which buffers events in filesystem
>> collector110.channels.channel2.type = file
>> collector110.channels.channel2.checkpointDir =
>> /data/flume_data/channel2/checkpoint
>> collector110.channels.channel2.dataDirs = /data/flume_data/channel2/data
>>
>> # Bind the source and sink to the channel configured above
>> collector110.sources.source1.channels = channel1 channel2
>> collector110.sinks.sink1.channel = channel1
>> collector110.sinks.sink2.channel = channel2
>>
>> Sagar
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
View raw message