flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Máté Gulyás <guly...@dmlab.hu>
Subject Re: Flume NG and S3
Date Tue, 01 Jul 2014 09:53:28 GMT
We run flume on the EC2 instances and sink the aggregates to S3, but
we have to that (S3) due to cost constraints.

Mate

On Tue, Jul 1, 2014 at 10:44 AM, Asim Zafir <asim.zafir@gmail.com> wrote:
> you will have to see how much of a performance comprise going to be flume
> sink to s3. i would highly recommend using flume 1.5.0+ due to whole lot of
> bug fixes and optimization it comes with. Moreover, if you can afford,
> instead of going s3 route, fire up some EBS volume on EC2 and setup a HDFS
> cluster and sink the files there. that would be much better then going to s3
> route
>
>
> On Tue, Jul 1, 2014 at 1:42 AM, Máté Gulyás <gulyasm@dmlab.hu> wrote:
>>
>> Our current stack has Flume wwith Socket source and HDFS sink. We move
>> to AWS and keeping flume would be a great time saver. Kinesis looks
>> good, but If I can use flume I would stick with it. Due to S3 PUT
>> price, we have to aggregate and flume does that with the filechannel.
>>
>> Mate Gulyas
>>
>> On Tue, Jul 1, 2014 at 10:05 AM, Nitin Pawar <nitinpawar432@gmail.com>
>> wrote:
>> > If you are heavily dependent on AWS stack then instead of kafka you can
>> > look
>> > at AWS Kinesis and then from their on there is good integration
>> > available to
>> > AWS s3 or any other service you want to dump data.
>> >
>> >
>> >
>> >
>> > On Tue, Jul 1, 2014 at 1:33 PM, Asim Zafir <asim.zafir@gmail.com> wrote:
>> >>
>> >> Kafka's framework is designed for scalable read i/o's then for a
>> >> massive
>> >> write event push coming to a centralize storage such as that of hdfs.
>> >>
>> >> not sure, how flume's avro sink to s3 would turn out for entire flume
>> >> pipeline. i suspect it will be fatal to carry on a memory channel and
>> >> even
>> >> if you have a file chnanel on the flume agent/collectors, it is very
>> >> likely
>> >> it will cause buffering on the channel.
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Jun 30, 2014 at 11:47 PM, Máté Gulyás <gulyasm@dmlab.hu>
wrote:
>> >>>
>> >>> Please see my comments inline.
>> >>>
>> >>> YIMEN YIMGA Gael wrote:
>> >>> > Could you please communicate the link of the article you read please
>> >>> > ?
>> >>> https://gist.github.com/crowdmatt/5256881 and the last comment.
>> >>>
>> >>> Sharninder wrote
>> >>> > No reason to not use flume except for the fact that S3, since its
>> >>> > over
>> >>> > the wire, will be a lot slower than a local hdfs cluster in which
>> >>> > case you
>> >>> > need a big enough channel to hold events not yet processed out
of
>> >>> > the sink.
>> >>> > If you have a fast enough pipe, you can very well use flume for
this
>> >>> > sort of
>> >>> > use-case.
>> >>> I plan to aggregate 5-15GB data with Filechannel, as I want to flush
>> >>> to S3 every hour on every node. As far as I know Flume can gzip it,
so
>> >>> the size would be about 500MB-1,5GB.
>> >>>
>> >>> Thanks for the feedback, I will write If I have any results.
>> >>>
>> >>> Mate Gulyas
>> >>>
>> >>> On Tue, Jul 1, 2014 at 6:26 AM, Sharninder <sharninder@gmail.com>
>> >>> wrote:
>> >>> > No reason to not use flume except for the fact that S3, since its
>> >>> > over
>> >>> > the
>> >>> > wire, will be a lot slower than a local hdfs cluster in which case
>> >>> > you
>> >>> > need
>> >>> > a big enough channel to hold events not yet processed out of the
>> >>> > sink.
>> >>> > If
>> >>> > you have a fast enough pipe, you can very well use flume for this
>> >>> > sort
>> >>> > of
>> >>> > use-case.
>> >>> >
>> >>> > The reason the author might have moved to kafka, and I'm just
>> >>> > speculating
>> >>> > here, is that kafka provides him better buffering support for
>> >>> > exactly
>> >>> > the
>> >>> > case I've written above.
>> >>> >
>> >>> > HTH
>> >>> > Sharninder
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Mon, Jun 30, 2014 at 7:57 PM, Máté Gulyás <gulyasm@dmlab.hu>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi!
>> >>> >>
>> >>> >> I would like to use flume to aggregate and send logs to an
S3
>> >>> >> bucket.
>> >>> >> I did some research, but the last article I found on the topic
was
>> >>> >> more then a year old and the author abandoned Flume for Kafka.
My
>> >>> >> other concern is that most of the articles were written for
Flume
>> >>> >> OG,
>> >>> >> not NG.
>> >>> >> Is there any reason why I should not use flume to sink messages
to
>> >>> >> S3?
>> >>> >>
>> >>> >>
>> >>> >> Thanks in advance.
>> >>> >>
>> >>> >> Mate Gulyas
>> >>> >> Lead Developer at Dmlab
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>
>

Mime
View raw message