Return-Path: X-Original-To: apmail-incubator-flume-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-flume-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 240E766B1 for ; Sat, 6 Aug 2011 10:27:15 +0000 (UTC) Received: (qmail 22352 invoked by uid 500); 6 Aug 2011 10:27:14 -0000 Delivered-To: apmail-incubator-flume-user-archive@incubator.apache.org Received: (qmail 22259 invoked by uid 500); 6 Aug 2011 10:27:04 -0000 Mailing-List: contact flume-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: flume-user@incubator.apache.org Delivered-To: mailing list flume-user@incubator.apache.org Received: (qmail 22241 invoked by uid 99); 6 Aug 2011 10:26:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Aug 2011 10:26:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harel.lior@gmail.com designates 74.125.82.175 as permitted sender) Received: from [74.125.82.175] (HELO mail-wy0-f175.google.com) (74.125.82.175) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Aug 2011 10:26:48 +0000 Received: by wyf19 with SMTP id 19so774596wyf.6 for ; Sat, 06 Aug 2011 03:26:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :message-id:references:to:x-mailer; bh=CU/fS9BErq8qcy+Y6I1BDoOO1uHM9fvBRuwegVCka0s=; b=Eq7U/RUeopuNNsL667ZBa7XbAtcr27xWCPHu5CO4kA9Bn7f1t9AyV8G9xE3hpnXUrH iEUojw/4FHQr/zqcnEVRlIDofSkITH9ezaYIc3OmqFg6ajOkZE3mJQc20XAAfhO7HtHu Pf3XC59ufXNrzcJKZMtxByVFI75gFeiPqUbGc= Received: by 10.216.59.129 with SMTP id s1mr2611803wec.77.1312626386960; Sat, 06 Aug 2011 03:26:26 -0700 (PDT) Received: from [192.168.1.103] (bzq-79-183-40-107.red.bezeqint.net [79.183.40.107]) by mx.google.com with ESMTPS id e56sm2330852wed.17.2011.08.06.03.26.24 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 06 Aug 2011 03:26:25 -0700 (PDT) Subject: Re: Metadata parsing Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-5--688325894 From: Lior Harel In-Reply-To: Date: Sat, 6 Aug 2011 13:25:23 +0300 Cc: Brian Tran Message-Id: References: To: flume-user@incubator.apache.org X-Mailer: Apple Mail (2.1084) --Apple-Mail-5--688325894 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii sure, let's do this. I'll join the dev mailing list, and see if i can = help with the implementation. On Aug 5, 2011, at 6:34 PM, Jonathan Hsieh wrote: > Lior,=20 >=20 > Ah, good point, I mispoke. Thanks for correcting me! >=20 > Unfortunately, you are correct, flume currently can't do this = out-of-the-box.=20 >=20 > It seems like a reasonable addition and would be gladly accepted patch = if someone were to implement it. If you, Brian, or anyone else is = interested in building this, let's move discussion about this to the = flume-dev@incubator.apache.org! >=20 > Thanks, > Jon. >=20 > On Fri, Aug 5, 2011 at 1:30 AM, Lior Harel = wrote: > Hi Jon, > I'm interested in the same use case as Brian asked about, I'm not sure = I understand your answer, as far as I understand the regex decorator can = only extract data out of the event body, while the tailSrcFile attibute = is part of the metadata. Can the regex decorator somehow operate on it? >=20 >=20 > Lior=20 >=20 > On Aug 5, 2011, at 9:35 AM, Jonathan Hsieh wrote: >=20 >> [bcc flume-user@cloudera.org (deprecated), cc = flume-user@incubator.apache.org] >>=20 >> Brian, >>=20 >> The easiest way is to use the regex decorator to create a new = attribute and use that attribute as to do output bucketing. >>=20 >> = http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_extractors >>=20 >> Jon. >>=20 >> On Mon, Jul 25, 2011 at 5:50 PM, Brian Tran = wrote: >> I want to do output bucketing based on the tailSrcFile metadata value >> set by the tailDir source. However, I only want part of the value for >> the destination path in HDFS. >>=20 >> For example, I have an event with the tailSrcFile value >> "unwanted_prefix_category_name-2011-07-25.log" but only want to use >> "category_name" for output bucketing. >>=20 >> What is the easiest way to do this? >>=20 >>=20 >>=20 >> --=20 >> // Jonathan Hsieh (shay) >> // Software Engineer, Cloudera >> // jon@cloudera.com >> =20 >>=20 >=20 >=20 >=20 >=20 > --=20 > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // jon@cloudera.com > =20 >=20 --Apple-Mail-5--688325894 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii sure, let's do this. I'll join the dev mailing list, and see if i can help with the implementation.

On Aug 5, 2011, at 6:34 PM, Jonathan Hsieh wrote:

Lior, 

Ah, good point, I mispoke.  Thanks for correcting me!

Unfortunately, you are correct,  flume currently can't do this out-of-the-box. 

It seems like a reasonable addition and would be gladly accepted patch if someone were to implement it.  If you, Brian, or anyone else is  interested in building this, let's move discussion about this to the flume-dev@incubator.apache.org!

Thanks,
Jon.

On Fri, Aug 5, 2011 at 1:30 AM, Lior Harel <harel.lior@gmail.com> wrote:
Hi Jon,
I'm interested in the same use case as Brian asked about, I'm not sure I understand your answer, as far as I understand the regex decorator can only extract data out of the event body, while the tailSrcFile attibute is part of the metadata. Can the regex decorator somehow operate on it?


Lior 

On Aug 5, 2011, at 9:35 AM, Jonathan Hsieh wrote:


Brian,

The easiest way is to use the regex decorator to create a new attribute and use that attribute as to do output bucketing.


Jon.

On Mon, Jul 25, 2011 at 5:50 PM, Brian Tran <briantran86@gmail.com> wrote:
I want to do output bucketing based on the tailSrcFile metadata value
set by the tailDir source. However, I only want part of the value for
the destination path in HDFS.

For example, I have an event with the tailSrcFile value
"unwanted_prefix_category_name-2011-07-25.log" but only want to use
"category_name" for output bucketing.

What is the easiest way to do this?



--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera





--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera


--Apple-Mail-5--688325894--