hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sudhakara st <sudhakara...@gmail.com>
Subject Re: Question about Flume
Date Thu, 23 Jan 2014 05:25:44 GMT
Hello Kaalu Singh,

Flume is best mach for your requirement. First define the storage structure
of data in HDFS and how you are going process the stored data in HDFS. Data
is very large size flume supports multiple-hop flow, filtering and
aggregation. I think no source plugin required, a command or a script or a
program which converts you data in stream bytes work with flume.


On Thu, Jan 23, 2014 at 6:31 AM, Dhaval Shah <prince_mithibai@yahoo.co.in>wrote:

> Fair enough. I just wanted to point out that doing it via a script is
> going to be a million times faster to implement compared to something like
> Flume (and arguably more reliable too with no maintenance overhead). Don't
> get me wrong, we use Flume for our data collection as well but our use case
> is real time/online data collection and Flume does the job well. So nothing
> against Flume per se. I was just thinking - if a script becomes a pain down
> the road how much throw away effort are we talking about here, a few
> minutes to a few hours at max vs what happens if Flume becomes a pain, a
> few days to a few weeks of throw away work.
>
> Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android>
>
>  ------------------------------
> * From: * Kaalu Singh <kaalusingh1234@gmail.com>;
> * To: * <user@hadoop.apache.org>; Dhaval Shah <prince_mithibai@yahoo.co.in>;
>
> * Subject: * Re: Question about Flume
> * Sent: * Wed, Jan 22, 2014 11:20:52 PM
>
>   The closest built-in functionality to the use case I have is the
> "Spooling Directory Source" and I like the idea of using/building software
> with higher level languages like Java for reasons of extensibility etc (and
> don't like the idea of scripts).
>
> However, I am soliciting opinions and can be swayed to change my mind.
>
> Thanks for your response Dhaval - appreciate it.
>
> Regards
> KS
>
>
> On Wed, Jan 22, 2014 at 2:58 PM, Dhaval Shah <prince_mithibai@yahoo.co.in>wrote:
>
>> Flume is useful for online log aggregation in a streaming format. Your
>> use case seems more like a batch format where you just need to grab the
>> file and put it in HDFS at regular intervals which can be much more easily
>>  achieved by a bash script running on a cron'd basis.
>>
>> Regards,
>>
>> Dhaval
>>
>>
>> ________________________________
>> From: Kaalu Singh <kaalusingh1234@gmail.com>
>> To: user@hadoop.apache.org
>> Sent: Wednesday, 22 January 2014 5:52 PM
>> Subject: Question about Flume
>>
>>
>>
>> Hi,
>>
>> I have the following use case:
>>
>> I have data files getting generated frequently on a certain machine, X.
>> The only way I can bring them into my Hadoop cluster  is by SFTPing at
>> certain intervals of time and getting them and landing them in HDFS.
>>
>> I am new to Hadoop and to Flume. I read up about Flume and it seems like
>> this framework is appropriate for something like this although I did not
>> see an available 'source' that can do exactly what I am looking for.
>> Unavailability of a 'source' plugin is not a deal breaker for me as I can
>> write one but first I want to make sure this is the right way to go. So, my
>> questions are:
>>
>> 1. What are the pros/cons of using Flume for this use case?
>> 2. Does anybody know of a source plugin that does what I am looking for?
>> 3. Does anybody think I should not use Flume and instead write my own
>> application to achieve this use case?
>>
>> Thanks
>> KS
>>
>
>


-- 

Regards,
...Sudhakara.st

Mime
View raw message