flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Yongkun | Yongkun | BDD" <yongkun.w...@mail.rakuten.com>
Subject Re: sleep() in script doesn't work when called by exec Source
Date Fri, 23 Aug 2013 05:26:44 GMT
If it happened at the last hop in your test, it could possibly happen at the first hop.
Maybe the network is not fast in my test. I got "ChannelException: The channel has reached
it's capacity." either on agent side (first hop) or collector side (last hop sinking to hadoop).

My configuration of agent:

agent1.sources = spd1
agent1.sources.spd1.type = spooldir
agent1.sources.spd1.spoolDir = /log/flume-ng/agent1/spooldir/spd1
agent1.sources.spd1.deserializer.maxLineLength = 8192
agent1.sources.spd1.channels = file1

agent1.channels = file1
agent1.channels.file1.type = file
agent1.channels.file1.checkpointDir = /log/flume-ng/agent1/checkpoint
agent1.channels.file1.dataDirs = /log/flume-ng/agent1/data
agent1.channels.file1.capacity = 2000000
agent1.channels.file1.transactionCapacity = 100

agent1.sinks = avro1
agent1.sinks.avro1.type = avro
agent1.sinks.avro1.channel = file1
agent1.sinks.avro1.hostname = remote_host
agent1.sinks.avro1.port = 33333


Best Regards,
Yongkun Wang

On 2013/08/21, at 1:15, Paul Chavez wrote:

Yes, I am curious what you mean as well. When testing I had dropped a few 15GB files in the
spoolDir and while they processed slowly they did complete. In fact, my only issue with that
test was the last hop HDFS sinks couldn’t keep up and I had to add a couple more to keep
upstream channels from filling up.


From: Brock Noland [mailto:brock@cloudera.com]
Sent: Tuesday, August 20, 2013 7:59 AM
To: user@flume.apache.org<mailto:user@flume.apache.org>
Subject: Re: sleep() in script doesn't work when called by exec Source


Can you share the details of this?  It shouldn't die with large files.

On Tue, Aug 20, 2013 at 3:43 AM, Wang, Yongkun | Yongkun | BDD <yongkun.wang@mail.rakuten.com<mailto:yongkun.wang@mail.rakuten.com>>
Thanks Brock.

I tried spooling directory, if the file dropped in spoolDir was too large, flume also died.
There should be a blocking.
Will start a standalone script process to drop small files.

Best Regards,
Yongkun Wang

On 2013/08/19, at 22:08, Brock Noland wrote:

In your case I would look at the spooling directory source.

On Sun, Aug 18, 2013 at 9:29 PM, Wang, Yongkun | Yongkun | BDD <yongkun.wang@mail.rakuten.com<mailto:yongkun.wang@mail.rakuten.com>>

I am testing with apache-flume-1.4.0-bin.
I made a naive python script for exec source to do throttling by calling sleep() function.
But the sleep() doesn't work when called by exec source.
Any ideas about this or do you have some simply solution for throttling instead of a custom

Flume config:

agent.sources = src1

agent.sources.src1.type = exec

agent.sources.src1.command = read-file-throttle.py



import time



with open("apache.log") as infile:

    for line in infile:

        line = line.strip()

        print line

        count += 1

        if count % 50000 == 0:

            now_time = time.time()

            diff = now_time - pre_time

            if diff < 10:

                #print "sleeping %s seconds ..." % (diff)


                pre_time = now_time

Thank you very much.

Best Regards,
Yongkun Wang

Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org<http://mrunit.apache.org/>

Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

View raw message