Return-Path: X-Original-To: apmail-incubator-flume-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-flume-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 471CBCD4E for ; Fri, 18 May 2012 21:17:06 +0000 (UTC) Received: (qmail 25313 invoked by uid 500); 18 May 2012 21:17:06 -0000 Delivered-To: apmail-incubator-flume-user-archive@incubator.apache.org Received: (qmail 25140 invoked by uid 500); 18 May 2012 21:17:06 -0000 Mailing-List: contact flume-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: flume-user@incubator.apache.org Delivered-To: mailing list flume-user@incubator.apache.org Received: (qmail 25132 invoked by uid 99); 18 May 2012 21:17:06 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 May 2012 21:17:06 +0000 Received: from localhost (HELO mail-lb0-f175.google.com) (127.0.0.1) (smtp-auth username arvind, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 May 2012 21:17:05 +0000 Received: by lbol5 with SMTP id l5so2515995lbo.6 for ; Fri, 18 May 2012 14:17:03 -0700 (PDT) Received: by 10.152.105.235 with SMTP id gp11mr12460298lab.44.1337375823630; Fri, 18 May 2012 14:17:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.0.193 with HTTP; Fri, 18 May 2012 14:16:23 -0700 (PDT) In-Reply-To: References: From: Arvind Prabhakar Date: Fri, 18 May 2012 14:16:23 -0700 Message-ID: Subject: Re: New production setup To: flume-user@incubator.apache.org Content-Type: multipart/alternative; boundary=f46d0407143be6132f04c056128c --f46d0407143be6132f04c056128c Content-Type: text/plain; charset=ISO-8859-1 Hi Mahesh, The concepts of Flume 1.x (NG) are different from Flume 0.9.x. For a quick premier on the changed concepts and to understand them better, please glance thorough the blog post we did earlier [2]. Due to these changes the components developed for earlier version of Flume are not compatible with the new implementation. Regarding implementing custom sinks in Flume 1.x, it is fairly straightforward. You create an implementation of the interface org.apache.flume.Sink. If your implementation class is com.exmample.custom.MySink, you can plug that into the system via the following configuration: agent.channels = c1 agent.sinks = s1 agent.sinks.s1.type = com.example.custom.MySink agent.sinks.s1.channel = c1 agent.sinks.s1.sink_property = value ... Any configuration within the agent.sinks.s1 namespace will be passed to the configure() method implemented by your sink before it is start()ed. If the system shutsdown, the sink will be stop()ped before that etc. For even easier route into implementing custom sinks for Flume 1.x, just extend out of an existing sink like the LoggerSink and override the process() method. Hope this helps. Thanks, Arvind Prabhakar [2] https://blogs.apache.org/flume/entry/flume_ng_architecture On Fri, May 18, 2012 at 2:04 PM, M@he$h wrote: > Hello Arvind, > > I was using flume-0.9.x version and I had everything working nicely , the > only issue I had was tailing a specific file which is in discussion in > another thread. The query I have is : I had my own regexAll extractor and > hbase sink java programs, so if I upgrade to flume-NG version , can I still > use the custom extractor and hbase sink programs with flume-NG? > > the flume-NG wiki > http://archive.cloudera.com/cdh4/cdh/4/flume-ng-1.1.0-cdh4.0.0b2/FlumeUserGuide.html, does not give much explanation or samples on how to use the custom sinks. > Could you please let me know about it? > > look forward for your response. > > > On Fri, May 18, 2012 at 8:54 AM, Arvind Prabhakar wrote: > >> Hi Simon, >> >> The wiki page is dated to say the least. At the moment there are many >> active deployments of Flume NG that are in staging if not production. I >> encourage you to look at the performance numbers that were recently >> published on the wiki [1]. >> >> The usecase you have described seems something that Flume should be able >> to handle very easily. I encourage you to look at the log4j appender, >> Memory/File channels and the HDFS event sink. Of course you could plan on >> using other components as well if this does not fit well with your >> application. >> >> [1] >> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements >> >> Thanks, >> Arvind Prabhakar >> >> >> On Fri, May 18, 2012 at 4:58 AM, Simon Kelly wrote: >> >>> Hi >>> >>> I'm interested in using Flume to store audit logs in HDFS which can then >>> be queried with Hive. I see that the links on the Flume page point to Flume >>> NG which says its not ready for production use yet. Is that still the case? >>> >>> Our use case would likely look something like this: >>> >>> - 15 servers running a Java web server and logging audit data (1-2K >>> per event, 20-90 events per second per server) >>> - Hadoop running on 5 machine cluster (4x2.4GHz processors, 8GB RAM, >>> 8TB total storage) >>> >>> Its important that all data makes it into HDFS. >>> >>> I'd appreciate any comments on how to proceed with this. >>> >>> Best regards >>> Simon Kelly >>> >> >> > > > -- > *Thanks and Regards, > * > Mahesh > 619-816-7011. > --f46d0407143be6132f04c056128c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Mahesh,

The concepts of Flume 1.x (NG) are different = from Flume 0.9.x. For a quick premier on the changed concepts and to unders= tand them better, please glance thorough the blog post we did earlier [2]. = Due to these changes the components developed for earlier version of Flume = are not compatible with the new implementation.

Regarding implementing custom sinks in Flume 1.x, it is= fairly straightforward. You create an implementation of the interface org.= apache.flume.Sink. If your implementation class is com.exmample.custom.MySi= nk, you can plug that into the system via the following configuration:

agent.channels =3D c1
agent.sinks =3D s1

agent.sinks.s1.type =3D com.example.custom.MySink
agent.sinks.s1.channel =3D c1
agent.sinks.s1.sink_property = =3D value
...

Any configuration within the agent.sinks.= s1 namespace will be passed to the configure() method implemented by your s= ink before it is start()ed. If the system shutsdown, the sink will be stop(= )ped before that etc.

For even easier route into implementing custom sinks fo= r Flume 1.x, just extend out of an existing sink like the LoggerSink and ov= erride the process() method.

Hope this helps.

Thanks,
Arvind Prabhakar

=





On Fri, May 18, 2012 at 2:04 PM, M@he$h <maheshrns@gmail.com> wrote:
Hello Arvind,

I was using flume-0.9.x= version and I had everything working nicely , the only issue I had was tai= ling a specific file which is in discussion in another thread. The query I = have is : I had my own regexAll extractor and hbase sink java programs, so = if I upgrade to flume-NG version , can I still use the custom extractor and= hbase sink programs with flume-NG?

the flume-NG wiki
http://archiv= e.cloudera.com/cdh4/cdh/4/flume-ng-1.1.0-cdh4.0.0b2/FlumeUserGuide.html= , does not give much explanation or samples on how to use the custom sinks= . Could you please let me know about it?

look forward for your response.
=

On Fri, May 18, 2012 at 8:54 AM, Arvind = Prabhakar <arvind@apache.org> wrote:
Hi Simon,

The wiki page i= s dated to say the least. At the moment there are many active deployments o= f Flume NG that are in staging if not production. I encourage you to look a= t the performance numbers that were recently published on the wiki [1].

The usecase you have described seems something that Flu= me should be able to handle very easily. I encourage you to look at the log= 4j appender, Memory/File=A0channels=A0and the HDFS event sink. Of course yo= u could plan on using other components as well if this does not fit well wi= th your application.

Arvind Prabhakar


On Fri, M= ay 18, 2012 at 4:58 AM, Simon Kelly <simongdkelly@gmail.com> wrote:
Hi

I'm i= nterested in using Flume to store audit logs in HDFS which can then be quer= ied with Hive. I see that the links on the Flume page point to Flume NG whi= ch says its not ready for production use yet. Is that still the case?

Our use case would likely look something like this:
  • 15 servers running a Java web ser= ver and logging audit data (1-2K per event, 20-90 events per second per ser= ver)
  • Hadoop running on 5 machine cluster (4x2.4GH= z processors, 8GB RAM, 8TB total storage)
Its important that = all data makes it into HDFS.

I'd appreciate any comments on how to proceed with this.

Best regards
Si= mon Kelly




--
Thanks and Regards,

Mahe= sh
619-8= 16-7011.


--f46d0407143be6132f04c056128c--