flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cuong Luu <ltcuong...@gmail.com>
Subject Use flume to copy data in local directory (hadoop server) into hdfs
Date Mon, 21 Oct 2013 10:25:24 GMT
Hi all,

I need to copy data in a local directory (hadoop server) into hdfs
regularly and automatically. This is my flume config:

agent.sources = execSource
agent.channels = fileChannel
agent.sinks = hdfsSink

agent.sources.execSource.type = exec

agent.sources.execSource.shell = /bin/bash -c
agent.sources.execSource.command = for i in /local-dir/*; do cat $i; done

agent.sources.execSource.restart = true
agent.sources.execSource.restartThrottle = 3600000
agent.sources.execSource.batchSize = 100

...
agent.sinks.hdfsSink.hdfs.rollInterval = 0
agent.sinks.hdfsSink.hdfs.rollSize = 262144000
agent.sinks.hdfsSink.hdfs.rollCount = 0
agent.sinks.hdfsSink.batchsize = 100000
...
agent.channels.fileChannel.type = FILE
agent.channels.fileChannel.capacity = 100000
...

while hadoop command takes 30second, Flume takes arround 4 minutes to copy
1 gb text file into HDFS. I am worried about whether the config is not good
or shouldn't use flume in this case?

How about your opinion?

Mime
View raw message