flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Yongkun | Yongkun | BDD" <yongkun.w...@mail.rakuten.com>
Subject Re: sleep() in script doesn't work when called by exec Source
Date Tue, 20 Aug 2013 08:44:24 GMT
Hi Paul,

Thank you for the suggestions. I will try it.

Best Regards,
Yongkun Wang

On 2013/08/20, at 2:56, Paul Chavez wrote:

I’ve setup something similar with the spooling directory source. I have a script that is
scheduled on the app server to create an incremental file every minute and then drop the incremental
file in the spool directory for processing. The use case is web logs that roll over daily,
but we want events ‘near’ real time. We didn’t want to use the exec source as that gives
no delivery guarantee, at least with a spooling source if the flume agent stops processing
the incremental files stay in the spool dir until it’s back up.

Hope that helps,
Paul Chavez

From: Wang, Yongkun | Yongkun | BDD [mailto:yongkun.wang@mail.rakuten.com]
Sent: Sunday, August 18, 2013 7:30 PM
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: sleep() in script doesn't work when called by exec Source

Hi,

I am testing with apache-flume-1.4.0-bin.
I made a naive python script for exec source to do throttling by calling sleep() function.
But the sleep() doesn't work when called by exec source.
Any ideas about this or do you have some simply solution for throttling instead of a custom
source?

Flume config:


agent.sources = src1

agent.sources.src1.type = exec

agent.sources.src1.command = read-file-throttle.py


read-file-throttle.py:


#!/usr/bin/python



import time



count=0

pre_time=time.time()

with open("apache.log") as infile:

    for line in infile:

        line = line.strip()

        print line

        count += 1

        if count % 50000 == 0:

            now_time = time.time()

            diff = now_time - pre_time

            if diff < 10:

                #print "sleeping %s seconds ..." % (diff)

                time.sleep(diff)

                pre_time = now_time



Thank you very much.

Best Regards,
Yongkun Wang


Mime
View raw message