flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ltcuong211 <ltcuong...@gmail.com>
Subject Re: Use flume to copy data in local directory (hadoop server) into hdfs
Date Thu, 24 Oct 2013 15:35:13 GMT
Hi Jeff & JS,

I tried using spooling dir source & memory channel. It still takes ~ 4 
minutes to copy 1gb data into hdfs.

By the way, thanks for suggesting spooling source. I think it is better 
than exec + cat in my case.

Cuong LUU

On 21/10/2013 22:50, Jeff Lord wrote:
> Luu,
>
> Have you tried using the spooling directory source?
>
> -Jeff
>
>
> On Mon, Oct 21, 2013 at 3:25 AM, Cuong Luu <ltcuong211@gmail.com 
> <mailto:ltcuong211@gmail.com>> wrote:
>
>     Hi all,
>
>     I need to copy data in a local directory (hadoop server) into hdfs
>     regularly and automatically. This is my flume config:
>
>     agent.sources = execSource
>     agent.channels = fileChannel
>     agent.sinks = hdfsSink
>
>     agent.sources.execSource.type = exec
>
>     agent.sources.execSource.shell = /bin/bash -c
>     agent.sources.execSource.command = for i in /local-dir/*; do cat
>     $i; done
>
>     agent.sources.execSource.restart = true
>     agent.sources.execSource.restartThrottle = 3600000
>     agent.sources.execSource.batchSize = 100
>
>     ...
>     agent.sinks.hdfsSink.hdfs.rollInterval = 0
>     agent.sinks.hdfsSink.hdfs.rollSize = 262144000
>     agent.sinks.hdfsSink.hdfs.rollCount = 0
>     agent.sinks.hdfsSink.batchsize = 100000
>     ...
>     agent.channels.fileChannel.type = FILE
>     agent.channels.fileChannel.capacity = 100000
>     ...
>
>     while hadoop command takes 30second, Flume takes arround 4 minutes
>     to copy 1 gb text file into HDFS. I am worried about whether the
>     config is not good or shouldn't use flume in this case?
>
>     How about your opinion?
>
>


Mime
View raw message