hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhaval Shah <prince_mithi...@yahoo.co.in>
Subject Re: Question about Flume
Date Wed, 22 Jan 2014 22:58:45 GMT
Flume is useful for online log aggregation in a streaming format. Your use case seems more
like a batch format where you just need to grab the file and put it in HDFS at regular intervals
which can be much more easily  achieved by a bash script running on a cron'd basis. 



From: Kaalu Singh <kaalusingh1234@gmail.com>
To: user@hadoop.apache.org 
Sent: Wednesday, 22 January 2014 5:52 PM
Subject: Question about Flume


I have the following use case:

I have data files getting generated frequently on a certain machine, X. The only way I can
bring them into my Hadoop cluster  is by SFTPing at certain intervals of time and getting
them and landing them in HDFS.  

I am new to Hadoop and to Flume. I read up about Flume and it seems like this framework is
appropriate for something like this although I did not see an available 'source' that can
do exactly what I am looking for. Unavailability of a 'source' plugin is not a deal breaker
for me as I can write one but first I want to make sure this is the right way to go. So, my
questions are:

1. What are the pros/cons of using Flume for this use case? 
2. Does anybody know of a source plugin that does what I am looking for? 
3. Does anybody think I should not use Flume and instead write my own application to achieve
this use case?


View raw message