Return-Path: X-Original-To: apmail-incubator-flume-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-flume-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AFCE8308 for ; Wed, 10 Aug 2011 16:23:26 +0000 (UTC) Received: (qmail 14217 invoked by uid 500); 10 Aug 2011 16:23:25 -0000 Delivered-To: apmail-incubator-flume-user-archive@incubator.apache.org Received: (qmail 14143 invoked by uid 500); 10 Aug 2011 16:23:25 -0000 Mailing-List: contact flume-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: flume-user@incubator.apache.org Delivered-To: mailing list flume-user@incubator.apache.org Received: (qmail 14133 invoked by uid 99); 10 Aug 2011 16:23:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Aug 2011 16:23:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jon@cloudera.com designates 209.85.161.175 as permitted sender) Received: from [209.85.161.175] (HELO mail-gx0-f175.google.com) (209.85.161.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Aug 2011 16:23:18 +0000 Received: by gxk3 with SMTP id 3so129159gxk.6 for ; Wed, 10 Aug 2011 09:22:57 -0700 (PDT) Received: by 10.42.197.1 with SMTP id ei1mr8561060icb.222.1312993376727; Wed, 10 Aug 2011 09:22:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.115.6 with HTTP; Wed, 10 Aug 2011 09:22:35 -0700 (PDT) In-Reply-To: References: From: Jonathan Hsieh Date: Wed, 10 Aug 2011 09:22:35 -0700 Message-ID: Subject: Re: Metadata parsing To: flume-user@incubator.apache.org Cc: Lior Harel Content-Type: multipart/alternative; boundary=20cf303bff6ccff12704aa2917d1 X-Virus-Checked: Checked by ClamAV on apache.org --20cf303bff6ccff12704aa2917d1 Content-Type: text/plain; charset=ISO-8859-1 Brian, Here are some directions on how to contribute code: https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute They are new and in progress (new project infrastructure landed yesterday), and likely have some bugs so please let provide feed back on that as well! Thanks, Jon. On Tue, Aug 9, 2011 at 1:55 AM, Brian Tran wrote: > I actually wrote an implementation last week. If no one else has already > done it, how do I go about adding it? > > > On Sat, Aug 6, 2011 at 3:25 AM, Lior Harel wrote: > >> sure, let's do this. I'll join the dev mailing list, and see if i can help >> with the implementation. >> >> On Aug 5, 2011, at 6:34 PM, Jonathan Hsieh wrote: >> >> Lior, >> >> Ah, good point, I mispoke. Thanks for correcting me! >> >> Unfortunately, you are correct, flume currently can't do this >> out-of-the-box. >> >> It seems like a reasonable addition and would be gladly accepted patch if >> someone were to implement it. If you, Brian, or anyone else is interested >> in building this, let's move discussion about this to the >> flume-dev@incubator.apache.org! >> >> Thanks, >> Jon. >> >> On Fri, Aug 5, 2011 at 1:30 AM, Lior Harel wrote: >> >>> Hi Jon, >>> I'm interested in the same use case as Brian asked about, I'm not sure I >>> understand your answer, as far as I understand the regex decorator can only >>> extract data out of the event body, while the tailSrcFile attibute is part >>> of the metadata. Can the regex decorator somehow operate on it? >>> >>> >>> Lior >>> >>> On Aug 5, 2011, at 9:35 AM, Jonathan Hsieh wrote: >>> >>> [bcc flume-user@cloudera.org (deprecated), cc >>> flume-user@incubator.apache.org] >>> >>> Brian, >>> >>> The easiest way is to use the regex decorator to create a new attribute >>> and use that attribute as to do output bucketing. >>> >>> http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_extractors >>> >>> Jon. >>> >>> On Mon, Jul 25, 2011 at 5:50 PM, Brian Tran wrote: >>> >>>> I want to do output bucketing based on the tailSrcFile metadata value >>>> set by the tailDir source. However, I only want part of the value for >>>> the destination path in HDFS. >>>> >>>> For example, I have an event with the tailSrcFile value >>>> "unwanted_prefix_category_name-2011-07-25.log" but only want to use >>>> "category_name" for output bucketing. >>>> >>>> What is the easiest way to do this? >>>> >>> >>> >>> >>> -- >>> // Jonathan Hsieh (shay) >>> // Software Engineer, Cloudera >>> // jon@cloudera.com >>> >>> >>> >>> >> >> >> -- >> // Jonathan Hsieh (shay) >> // Software Engineer, Cloudera >> // jon@cloudera.com >> >> >> >> > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // jon@cloudera.com --20cf303bff6ccff12704aa2917d1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Brian,

Here are some directions on how to contribu= te code:


They are new and in progress (new project infrastructur= e landed yesterday), and likely have some bugs so please let provide feed b= ack on that as well!

Thanks,
Jon.


On Tue, Aug 9, 2011 at 1:55 AM= , Brian Tran <briantran86@gmail.com> wrote:
I actually wrote an implementation last week. If no one else has already do= ne it, how do I go about adding it?

On Sat, Aug 6, 2011 at 3:25 AM, Lior Harel <harel.lior@gmail.com> wrote:
sure, le= t's do this. I'll join the dev mailing list, and see if i can help = with the implementation.

On Aug 5, 2011, at 6:34 PM, Jonathan Hsi= eh wrote:

Lior,=A0

Ah= , good point, I mispoke. =A0Thanks for correcting me!

Unfortunately, you are correct, =A0flume currently can't do = this out-of-the-box.=A0

It seems like a reasonable= addition and would be gladly accepted patch if someone were to implement i= t. =A0If you, Brian, or anyone else is =A0interested in building this, let&= #39;s move discussion about this to the flume-dev@incubator.apache.org!

Thanks,
Jon.

On Fri, Aug 5, 2011 at 1:30 AM, Lior Harel <harel.lior@gmail.co= m> wrote:
Hi Jon,<= div>I'm interested in the same use case as Brian asked about, I'm n= ot sure I understand your answer, as far as I understand the regex decorato= r can only extract data out of the event body, while the tailSrcFile attibu= te is part of the metadata. Can the regex decorator somehow operate on it?<= /div>


Lior=A0

On Aug 5, 2011, at 9:35 AM, Jon= athan Hsieh wrote:


Brian,

The easiest way is to use the regex decor= ator to create a new attribute and use that attribute as to do output bucke= ting.


Jon.

On Mon, Jul 25, 2011 at 5:50 PM, Brian Tran <briantran86@gmail.com> wrote:
I want to do output bucketing based on the tailSrcFile metadata value
set by the tailDir source. However, I only want part of the value for
the destination path in HDFS.

For example, I have an event with the tailSrcFile value
"unwanted_prefix_category_name-2011-07-25.log" but only want to u= se
"category_name" for output bucketing.

What is the easiest way to do this?



--
// Jonathan Hsieh (shay= )
// Software Engineer, Cloudera





--
// Jonathan Hsieh (shay)
// Software Engineer, C= loudera






--
// Jonathan= Hsieh (shay)
// Software Engineer, Cloudera
--20cf303bff6ccff12704aa2917d1--